> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pdf.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Document Classifier

> Leverage AI to analyze the text of input documents and classify them into categories such as invoices, orders, or industry\-specific types. This feature is particularly useful for quickly identifying the source of a document. It also allows for the implementation of custom classification rules to tailor the analysis to specific needs.

<Frame>
  <img src="https://mintcdn.com/pdfco/tXGo3rbTS_pEF5es/images/integrations/make/make-step9.png?fit=max&auto=format&n=tXGo3rbTS_pEF5es&q=85&s=4d0bb35deacf3003d36067031be71eb3" alt="Make Step" width="896" height="964" data-path="images/integrations/make/make-step9.png" />
</Frame>

## Input

| Name               | Description                                                                        | Required |
| ------------------ | ---------------------------------------------------------------------------------- | -------- |
| **Import Options** | Choose the input source, either `Upload a File` or `Import PDF or Image from URL`. | Yes      |

***

**Upload a File**

| Name     | Description                                                                                                                                                                          | Required |
| -------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- |
| **Data** | Upload a file using raw binary data from another module. Note: This requires additional credits as it first uploads to [PDF.co Temporary Files Storage](/api-reference/file-upload). | Yes      |

**Import PDF or Image from URL**

| Name    | Description                                                                                                                                                                                                                                               | Required |
| ------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- |
| **URL** | Provide the URL to the source **PDF** document, or a `filetoken://` link from [PDF.co Built-In Files Storage](https://app.pdf.co/files). If you use another cloud service such as **Google Drive** or **Dropbox** ensure the link is publicly accessible. | Yes      |

| Name                                    | Description                                                                                                                                                                                                                                                                 | Required |
| --------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- |
| **Set custom rules**                    | Optionally, define classification rules in **CSV** format. Each row should be formatted as `classname,logic,keyword1,keyword2`. Example: `Amazon,AND,Amazon AWS,AWS Invoice`. For detailed instructions, refer to [PDF Classifier](https://pdf.co/pdf-classifier).          | No       |
| **Load custom rules from CSV via url**  | Provide a link to a **CSV** containing custom classification rules. Each row should be formatted as `classname,logic,keyword1,keyword2`. Example: `Amazon,AND,Amazon AWS,AWS Invoice`. For detailed instructions, refer to [PDF Classifier](https://pdf.co/pdf-classifier). | No       |
| **Case Sensitive Custom Rules Enabled** | Specify whether the keywords in custom rules should be case sensitive.                                                                                                                                                                                                      | No       |
| **Execution Mode**                      | Select **Sync** for small tasks up to `10` seconds. Choose **Async** for standard jobs, or **Async For Large Docs** for tasks over `30` seconds. Use **Job Check** module for retrieving results in large tasks.                                                            | No       |
| **Profiles**                            | Add custom options for the process in a `JSON` string format. See [API Profiles](#profiles) for more details.                                                                                                                                                               | No       |

### Integrating External File Sources

<Note>
  Streamline your **Make** workflows with external file sources like **Google Drive** and **Dropbox** using their unique actions. Discover efficient integration strategies in our guide: [File Source Integrations in Make](/integrations/make/input-file-sources).
</Note>

## Output

| Name                  | Description                                                                                                                       |
| --------------------- | --------------------------------------------------------------------------------------------------------------------------------- |
| `url`                 | This is the temporary **URL** provided by the **PDF.co** file server.                                                             |
| `Body`                | Contains the identified document categories, listed in a `classes` string array.                                                  |
| `Status`              | Indicates the [response status](/api-reference/introduction) code. A `success` status is returned if the operation is successful. |
| `outputLinkValidTill` | Specifies the timestamp until which the `url` remains accessible.                                                                 |
| `error`               | Provides details about any errors encountered during the process, if applicable.                                                  |
| `File Name`           | The designated name of the output file.                                                                                           |
| `Job Id`              | A unique identifier assigned to the job.                                                                                          |
| `credits`             | The amount of credits utilized for the process.                                                                                   |
| `Remaining Credits`   | Displays the balance of credits available in your account.                                                                        |
| `duration`            | The duration of time the process took to complete.                                                                                |

### Profiles

<Warning>
  To display the Profiles fields, you must **enable Advanced Settings** by clicking the toggle:

  <Frame>
    <img src="https://mintcdn.com/pdfco/tXGo3rbTS_pEF5es/images/integrations/make/show-advanced-settings.png?fit=max&auto=format&n=tXGo3rbTS_pEF5es&q=85&s=8a9777bce100fedc593f641fa3140bd2" alt="Advanced Settings" width="558" height="70" data-path="images/integrations/make/show-advanced-settings.png" />
  </Frame>
</Warning>

You can set additional options for the operation used in the [PDF.co](http://pdf.co/) module by using **Profiles**. A profile is a string in JSON-like format containing predefined parameters.

### Here’s an example of a Custom Profiles input:

```json theme={null}
{ "outputDataFormat": "base64" }
```

With this input, the [PDF.co](http://pdf.co/) module will return the output in base64 format. You can find the list of available parameters for customizing profiles in the [PDF.co](http://pdf.co/) operation documentation below:

<Tip>
  You can use any regular API parameter from the [API Reference](/api-reference) within Make using the `std_params` feature in profiles. The `std_params` enables the definition of regular API parameters in a JSON format, See [Standard Parameters](/api-reference/profiles#standard-parameters) for detailed documentation and examples.
</Tip>

| Parameter                      | Type                          | Default   | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| ------------------------------ | ----------------------------- | --------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `RenderTextObjects`            | boolean                       | `true`    | Render text objects or not                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `RenderVectorObjects`          | boolean                       | `true`    | Render vector objects or not                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |
| `RenderImageObjects`           | boolean                       | `true`    | Render image objects or not                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `TIFFCompression`              | string                        | `LZW`     | TIFF compression algorithm. The options are: None, LZW, CCITT3, CCITT4, RLE                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `OCRMode`                      | string                        | `Auto`    | Specifies how OCR (Optical Character Recognition) should process input content, offering various modes to tailor text extraction based on content type such as images, fonts, and vector graphics. For more information, see OCR Extraction Modes.                                                                                                                                                                                                                                                                                                                                 |
| `OCRResolution`                | integer                       | `300`     | Use this parameter to change the OCR resolution from the default 300 dpi. The range is from 72 to 1200 dpi.                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |
| `RotationAngle`                | integer                       | -         | Use manual rotation to handle PDFs with vertically drawn text. Normally, OCR automatically detects page rotation in PDFs and extracts text accurately. However, in some cases, the PDF might not have an actual rotated page --- Rather, the text itself is drawn vertically. In such scenarios, auto-detection may fail. You can use this parameter to manually set the page rotation. The available angles are: 0, 1, 2, 3.                                                                                                                                                      |
| `LineGroupingMode`             | string                        | `None`    | Controls line grouping in PDF text extraction. Modes: None (no grouping), GroupByRows (merge rows if all cells align), GroupByColumns (merge cells by column), JoinOrphanedRows (merge single-cell rows to above if no separator).                                                                                                                                                                                                                                                                                                                                                 |
| `ConsiderFontColors`           | boolean                       | `false`   | Controls whether font colors should be considered when detecting table structure and merging text objects during PDF extraction. Set to true to consider font colors.                                                                                                                                                                                                                                                                                                                                                                                                              |
| `DetectNewColumnBySpacesRatio` | string                        | `1.2`     | Controls how spaces between words are interpreted for column detection in PDF text extraction. It defines the ratio of space width that determines when text should be treated as being in separate columns.                                                                                                                                                                                                                                                                                                                                                                       |
| `AutoAlignColumnsToHeader`     | boolean                       | `true`    | Controls how columns are detected and aligned during table extraction from PDF documents. It affects both table structure detection and text extraction with formatting preservation. Set to true to automatically align columns to the header row. When set to true (default), the row with the most columns is used as the header, and all other rows are aligned to this structure --- ideal for well-structured tables. When set to false, columns are analyzed independently across all rows to build the structure, which works better for inconsistent or irregular tables. |
| `OCRImagePreprocessingFilters` | object                        | -         | Image preprocessing filters for OCR. Refer to [OCRImagePreprocessingFilters](#ocrimagepreprocessingfilters) for usage examples.                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
|     `.AddGrayscale`            | boolean                       | `false`   | Converts to grayscale before OCR.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
|     `.AddGammaCorrection`      | array\[string (float format)] | `["1.4"]` | Adds a gamma correction filter.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    |
| `DataEncryptionAlgorithm`      | string                        | -         | Controls the encryption algorithm used for data encryption. See User-Controlled Encryption for more information. The available algorithms are: AES128, AES192, AES256.                                                                                                                                                                                                                                                                                                                                                                                                             |
| `DataEncryptionKey`            | string                        | -         | Controls the encryption key used for data encryption. See User-Controlled Encryption for more information.                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `DataEncryptionIV`             | string                        | -         | Controls the encryption IV used for data encryption. See User-Controlled Encryption for more information.                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |
| `DataDecryptionAlgorithm`      | string                        | -         | Controls the decryption algorithm used for data decryption. See User-Controlled Encryption for more information. The available algorithms are: AES128, AES192, AES256.                                                                                                                                                                                                                                                                                                                                                                                                             |
| `DataDecryptionKey`            | string                        | -         | Controls the decryption key used for data decryption. See User-Controlled Encryption for more information.                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
| `DataDecryptionIV`             | string                        | -         | Controls the decryption IV used for data decryption. See User-Controlled Encryption for more information.                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |

You can also use `Custom Profiles` to:

* [Disable text layer](https://docs.pdf.co/api-reference/pdf-to-image/png#disable-text-layer).
* [Set image preprocessing filters.](https://docs.pdf.co/api-reference/pdf-to-image/png#ocrimagepreprocessingfilters)
* Visit this page for [general information on Profiles usage.](https://docs.pdf.co/api-reference/profiles)
