JSON type of notation along with the profiles object for your API calls, for example:
Sample Code
- Postman
- Python
- C#
- Java
- JavaScript
- Powershell
- VB.NET
- CURL
Generic Profile Options
The followingprofiles options are not specific to any one particular endpoint.
Standard Parameters
Thestd_params within the profiles parameter enables the definition of regular API parameters in a JSON format. This std_params feature is designed to simplify the process of passing standard parameters and additional options in the profiles parameter for PDF.co API requests.
When using Standard Parameters webhooks can be utilized by setting the callback object with the URL of your choice. However, is is simpler to set the callback object directly - see Webhooks & Callbacks for more.
When
std_params are used in the profiles parameter, if a parameter is duplicated within both std_params and outside profiles, the value specified in std_params will overwrite the duplicate value. Therefore if you define a callback object in std_params then it will overwrite any value you may have defined via the basic callback object!std_params Structure
- Description: Contains key-value pairs of standard parameters that will be used across PDF.co API requests.
-
Type:
JSONObject (passed as a string) - Example:
Practical Application
Using thestd_params profile, you can define a set of standard parameters and configurations that will be consistently applied across your PDF.co API requests. This approach is particularly beneficial when using automation platforms like Zapier, Make, and others, where the number of parameters you can pass directly is limited.
Complete Request Example
Here is a complete example illustrating the use of thestd_params profile with other parameters:
/pdf/convert/to/text
Output as Base64
If you require your output asbase64 use the following:
Converting PDFs
There are a variety ofprofiles options which can be set when converting from PDF to other documents. These profiles control how to extract the information from the source PDF file.
These options apply to the following endpoints:
- /pdf/convert/to/csv
- /pdf/convert/to/xml
- /pdf/convert/to/json
- /pdf/convert/to/json2
- /pdf/convert/to/xls
- /pdf/convert/to/xlsx
Convert Vectors
You can choose whether the conversion process should convert vectors or not as follows:Save Images
Thisprofiles parameter includes the SaveImages property that extracts individual images in a regular PDF.
Consider Font Size
Thisprofiles parameter allows you to seperate header and body text based on font size.
Set the Extraction Area
Extract text in a specific area by defining the extraction area - set with points in the format[x, y, width, height].
Extracting Invisible Text
When dealing with PDF documents, sometimes there may be unwanted invisible text that makes it difficult to extract the desired content accurately. This could be due to various reasons such as the original document being scanned or saved with a low-quality setting. In such cases, it is important to remove the unwanted invisible text to ensure accurate extraction of the desired content.OCR (Optical Character Recognition) Mode Options
The following values can be configured for OCR mode:| OCR Mode | Description |
|---|---|
Auto (default) | Automatically determines the optimal OCR settings based on the input. |
AutoRepairFonts | Automatically repairs fonts in text extracted from images or other documents. |
TextFromImagesAndFonts | Extracts text from images and fonts from documents. |
TextFromImagesAndRepairedFonts | Extracts text from images and repaired fonts from documents. |
TextFromImagesAndVectorsAndFonts | Extracts text, vectors, and fonts from images and documents. |
TextFromImagesAndVectorsAndRepairedFonts | Extracts text, vectors, and repaired fonts from images and documents. |
TextFromImagesAndVectorsOnly | Extracts text and vectors from images only. |
TextFromImagesOnly | Extracts text from images only. |
TextFromRepairedFontsOnly | Extracts text from documents with repaired fonts only. |
TextFromVectorsAndFonts | Extracts text and fonts from documents with vectors. |
TextFromVectorsAndRepairedFonts | Extracts text and repaired fonts from documents with vectors. |
TextFromVectorsOnly | Extracts text from documents with vectors only. |
OCR (Optical Character Recognition) Resolution
OCR resolution can be set from72 to 1200 DPI. The default value is 300 DPI. The higher the resolution, the better the OCR results. However, higher resolution also means longer processing times.
Extracting Text from Colored Background
If you can’t extract text with a colored background, please add the Grayscale filter to theprofiles as follows:
Considering the Font Color on Tables
Sometimes the data which OCR must extract from a table might have colored text which is difficult to extract. OCR results can be improved with the following:Setting the Rotation Angle
Normally OCR detects PDF rotation and extracts text properly. But in some cases a PDF is constructed in such a way that a page is not rotated and instead text is drawn vertically, OCR does not detect page rotation automatically. In such scenarios we can use following profile setting.0no rotation190 degrees2180 degrees3270 degrees