> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pdf.co/llms.txt
> Use this file to discover all available pages before exploring further.

# Profiles

> This page describes the `profiles` parameter that can be used with your API calls.

Profiles are used to to set extra options for common API calls and are sometimes distinct to a particular API.

Profiles are embedded with a `JSON` type of notation along with the `profiles` object for your API calls, for example:

<Warning>Please note that the value for the `profiles` field in the code snippets must be enclosed in quotes (`"`), making it a complete string. For example: `{ "profiles": "{'TrimSpaces':true, 'PreserveFormattingOnTextExtraction': true}"}`</Warning>

## Sample Code

<Tabs>
  <Tab title="Postman">
    ```json theme={null}
    {
     "profiles": "{'TrimSpaces':true, 'PreserveFormattingOnTextExtraction': true}"
    }
    ```
  </Tab>

  <Tab title="Python">
    ```
    profiles = '"TrimSpaces": "True", "PreserveFormattingOnTextExtraction": "True" '
    ```
  </Tab>

  <Tab title="C#">
    ```json theme={null}
    {
     "profiles": "'TrimSpaces': 'True' , 'PreserveFormattingOnTextExtraction': 'True'"
    }
    ```
  </Tab>

  <Tab title="Java">
    ```
    String profiles = "{ 'TrimSpaces': 'True', 'PreserveFormattingOnTextExtraction': 'True' }";
    ```
  </Tab>

  <Tab title="JavaScript">
    ```
    const Profiles = "{ 'TrimSpaces': 'True', 'PreserveFormattingOnTextExtraction': 'True' }";
    ```
  </Tab>

  <Tab title="Powershell">
    ```
    $Profiles = '{ "TrimSpaces": "True", "PreserveFormattingOnTextExtraction": "True" }'
    ```
  </Tab>

  <Tab title="VB.NET">
    ```json theme={null}
    {
     "profiles": "'TrimSpaces': 'True' , 'PreserveFormattingOnTextExtraction': 'True'"
    }
    ```
  </Tab>

  <Tab title="CURL">
    ```json theme={null}
    {
     "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-json/sample.pdf",
     "inline": true,
     "profiles": "{ 'TrimSpaces': 'True', 'PreserveFormattingOnTextExtraction': 'True' }"
    }
    ```
  </Tab>
</Tabs>

***

## Generic Profile Options

The following `profiles` options are not specific to any one particular endpoint.

### Standard Parameters

The `std_params` within the `profiles` parameter enables the definition of regular API parameters in a `JSON` format. This `std_params` feature is designed to simplify the process of passing standard parameters and additional options in the `profiles` parameter for PDF.co API requests.

When using [Standard Parameters](#standard-parameters) webhooks can be utilized by setting the `callback` object with the URL of your choice. However, is is simpler to set the `callback` object directly - see [Webhooks & Callbacks](/api-reference/webhooks) for more.

<Note>When `std_params` are used in the `profiles` parameter, if a parameter is duplicated within both `std_params` and outside profiles, the value specified in `std_params` will overwrite the duplicate value. Therefore if you define a callback object in `std_params` then it will overwrite any value you may have defined via [the basic callback object](/api-reference/webhooks)!</Note>

#### `std_params` Structure

* **Description**: Contains key-value pairs of standard parameters that will be used across PDF.co API requests.

* **Type**: `JSON` Object (passed as a string)

* **Example**:

```json theme={null}
{
 "profiles": "{'std_params': {'callback': 'webhook_url'}}"
}
```

#### Practical Application

Using the `std_params` profile, you can define a set of standard parameters and configurations that will be consistently applied across your PDF.co API requests. This approach is particularly beneficial when using automation platforms like [Zapier](/integrations/zapier/), [Make](/integrations/make/), and others, where the number of parameters you can pass directly is limited.

#### Complete Request Example

Here is a complete example illustrating the use of the `std_params` profile with other parameters:

/pdf/convert/to/text

```json theme={null}
{
 "url": "https://pdfco-test-files.s3.us-west-2.amazonaws.com/pdf-to-text/sample.pdf",
 "inline": true,
 "profiles": "{'std_params': {'callback': 'webhook_url', 'async': true}, 'ExtractShadowLikeText': false, 'ExtractColumnByColumn': true, 'OCRMode': 'Auto'}}",
 "TrimSpaces": true,
 "PreserveFormattingOnTextExtraction": true
}
```

### Output as Base64

If you require your output as `base64` use the following:

```json theme={null}
{
 "profiles": "{ 'outputDataFormat': 'base64' }"
}
```

<Warning>This output data format is supported by endpoints that generate binary files - **PDF** and images. The output is accessible via a generated link and the file under the link is in a base64-encoded text format.</Warning>

### Converting PDFs

There are a variety of `profiles` options which can be set when converting from **PDF** to other documents. These `profiles` control how to extract the information from the source **PDF** file.

These options apply to the following endpoints:

* /pdf/convert/to/csv
* /pdf/convert/to/xml
* /pdf/convert/to/json
* /pdf/convert/to/json2
* /pdf/convert/to/xls
* /pdf/convert/to/xlsx

#### Convert Vectors

You can choose whether the conversion process should convert vectors or not as follows:

```json theme={null}
{
 "profiles": "{ 'SaveVectors': true }"
}
```

#### Save Images

This `profiles` parameter includes the `SaveImages` property that extracts individual images in a regular **PDF**.

```json theme={null}
{
 "profiles": "{ 'SaveImages': 'Embed' }"
}
```

#### Consider Font Size

This `profiles` parameter allows you to seperate header and body text based on font size.

```json theme={null}
{
 "profiles": "{ 'ConsiderFontSizes': true }"
}
```

#### Set the Extraction Area

Extract text in a specific area by defining the extraction area - set with points in the format `[x, y, width, height]`.

```json theme={null}
{
 "profiles": "{ 'ExtractionArea': [171.0,69.0,249.75,71.25] }"
}
```

#### Extract Hyperlinks

Extract hyperlinks (URLs) from a PDF document by using the `OutputStructure` and `OutputTransformation` profile options. This returns only the link objects found in the PDF.

```json theme={null}
{
 "profiles": "{ 'OutputStructure': 'OnlyLinks', 'OutputTransformation': '$..text' }"
}
```

* `OutputStructure`: Set to `OnlyLinks` to restrict the output to hyperlink elements only.
* `OutputTransformation`: Set to `$..text` (a JSONPath expression) to extract just the link text/URL values from the result.

#### Extracting Invisible Text

When dealing with **PDF** documents, sometimes there may be unwanted invisible text that makes it difficult to extract the desired content accurately. This could be due to various reasons such as the original document being scanned or saved with a low-quality setting. In such cases, it is important to remove the unwanted invisible text to ensure accurate extraction of the desired content.

```json theme={null}
{
 "profiles": "{ 'ExtractInvisibleText': false, 'ExtractShadowLikeText': false, 'OCRMode': 'Auto' }"
}
```

### OCR (Optical Character Recognition) Mode Options

The following values can be configured for OCR mode:

| OCR Mode                                   | Description                                                                   |
| ------------------------------------------ | ----------------------------------------------------------------------------- |
| `Auto` **(default)**                       | Automatically determines the optimal OCR settings based on the input.         |
| `AutoRepairFonts`                          | Automatically repairs fonts in text extracted from images or other documents. |
| `TextFromImagesAndFonts`                   | Extracts text from images and fonts from documents.                           |
| `TextFromImagesAndRepairedFonts`           | Extracts text from images and repaired fonts from documents.                  |
| `TextFromImagesAndVectorsAndFonts`         | Extracts text, vectors, and fonts from images and documents.                  |
| `TextFromImagesAndVectorsAndRepairedFonts` | Extracts text, vectors, and repaired fonts from images and documents.         |
| `TextFromImagesAndVectorsOnly`             | Extracts text and vectors from images only.                                   |
| `TextFromImagesOnly`                       | Extracts text from images only.                                               |
| `TextFromRepairedFontsOnly`                | Extracts text from documents with repaired fonts only.                        |
| `TextFromVectorsAndFonts`                  | Extracts text and fonts from documents with vectors.                          |
| `TextFromVectorsAndRepairedFonts`          | Extracts text and repaired fonts from documents with vectors.                 |
| `TextFromVectorsOnly`                      | Extracts text from documents with vectors only.                               |

```json theme={null}
{
 "profiles": "{ 'OCRMode': 'TextFromImagesAndVectorsAndRepairedFonts' }"
}
```

### OCR (Optical Character Recognition) Resolution

OCR resolution can be set from `72` to `1200` DPI. The default value is `300` DPI. The higher the resolution, the better the OCR results. However, higher resolution also means longer processing times.

```json theme={null}
{
 "profiles": "{ 'OCRResolution': 300 }"
}
```

#### Extracting Text from Colored Background

If you can’t extract text with a colored background, please add the Grayscale filter to the `profiles` as follows:

```json theme={null}
{
 "profiles": "{ 'OCRImagePreprocessingFilters.AddGrayscale()': [] }"
}
```

#### Considering the Font Color on Tables

Sometimes the data which OCR must extract from a table might have colored text which is difficult to extract. OCR results can be improved with the following:

```json theme={null}
{
 "profiles": "{
 'LineGroupingMode': 'JoinOrphanedRows',
 'ConsiderFontColors': true,
 'DetectNewColumnBySpacesRatio': '1.1',
 'AutoAlignColumnsToHeader': false,
 'OCRImagePreprocessingFilters.AddGammaCorrection()': [ '1.4' ]
 }"
}
```

#### Setting the Rotation Angle

Normally OCR detects **PDF** rotation and extracts text properly. But in some cases a **PDF** is constructed in such a way that a page is not rotated and instead text is drawn vertically, OCR does not detect page rotation automatically. In such scenarios we can use following profile setting.

```json theme={null}
{
 "profiles": "{ 'RotationAngle': 2 }"
}
```

* `0` no rotation
* `1` 90 degrees
* `2` 180 degrees
* `3` 270 degrees
