Click here to Skip to main content
15,891,734 members
Articles / Web Development / HTML
Article

Automated Document Conversion and Digital Transformation at Scale with ActivePDF DocConverter

7 Nov 2018CPOL 5.3K   3  
DocConverter is so powerful that it can convert millions of documents per day while it automatically encrypts, signs, emails, and FTPs documents after they are processed.

This article is in the Product Showcase section for our sponsors at CodeProject. These articles are intended to provide you with information on products and services that we consider useful and of value to developers.

Documents and their data are at the heart of every enterprise. Business professionals in every industry regularly share digital files such as PowerPoint, Word, Excel, and PDF documents. Most often these files contain large amounts of text, high-resolution images, and embedded attachments. Managing these documents and data streams across hundreds of possible file formats can quickly become an unwieldy task without the proper digital transformation and automation solutions in place.

What businesses need is a solution to automatically transform documents and their data on demand, whether it is to create a standard form or to conform to the requirements of the recipient.

ActivePDF provides a popular product called DocConverter that empowers developers and even IT administrators to convert documents to and from PDF at scale with little to no code. It watches user-defined folders, also known as "Watch Folders," for files, and then converts them as they arrive.

DocConverter is so powerful that it can convert millions of documents per day while it automatically encrypts, signs, emails, and FTPs documents after they are processed. DocConverter is even able to parse compressed ZIP files as part of the conversion process.

How to Configure Automated Document Conversion

First, download and start your free trial of DocConverter on your server here:

https://www.activepdf.com/products/docconverter

After DocConverter is installed, you can click the desktop icon for the configuration manager:

Image 1

Figure 1: Configuration launcher for ActivePDF DocConverter.

The shortcut opens a local weblink that you can also navigate to directly. The web page provides an intuitive in-browser configuration experience.

From the main page you can:

  • Edit profiles for PDF conversions
  • Set up your print-to-PDF preferences
  • Configure email preferences
  • Specify FTP sites
  • Create profiles for conversions from various document sources
  • Indicate what folders should result in automatic conversion
  • Perform other tasks related to signatures and filters, etc.

To get started, navigate to the "Watch Folder Profiles" tab and pull out the default paths for document conversion.

Automatic Document Conversion Settings and Preferences

DocConverter automatically converts files dropped in a specified folder location of your choice. The to and from PDF conversion automation works with hundreds of different file formats and different customizable configurations. You should also note that DocConverter automatically unzips compressed files and performs the conversion against every item in the zipped archive.

How to Convert Word to PDF

To test this scenario, create a simple Word document with a few formatted lines and drop it in the "Input" folder. It disappears almost immediately, which means the system recognized it and began processing immediately, resulting in a high-fidelity PDF found in the corresponding "Output" folder that rendered perfectly:

Image 2

Figure 2: Convert Word to PDF test with DocConverter from ActivePDF.

How to Convert Excel to PDF

Next, drop an Excel spreadsheet into the "Input" folder of your choice. For this example, a workout program was used. The converted PDF again appears almost immediately in the Output folder:

Image 3

Figure 3: Convert Excel to PDF test with DocConverter from ActivePDF.

How to Convert Image to PDF from a .ZIP File

For a final test, collect several images and zip them into a single file. Drop the zip file into the target directory, wait a few seconds, and get ready to view the result. The screenshot that follows is an example with conversion configured to merge multiple image files into a single PDF document:

Image 4

Figure 4: Convert Image to PDF test with multiple images in a .ZIP file with DocConverter from ActivePDF.

The drag-and-drop "Watch Folder" functionality is great and extremely convenient for those who want to deploy this functionality faster to their organizations, but DocConverter does more than simply automate document conversion. It provides a set of powerful and flexible APIs that enable developers to programmatically convert documents with full control over how the conversion happens and what happens with the output. The endless possibilities are fun and truly only limited to your imagination.

How to Convert Text to PDF as a Developer Using DocConverter APIs

The APDocConverter class provides full functionality to convert documents to and from the PDF format. To see it in action, open Visual Studio and create a simple .NET console application. Add a reference to APDocConverter.Net45 (the documentation describes where to find the referenceable DLL). The following code is all that you need to parse a document from the command line and convert it to PDF:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            var dc = new APDocConverter.DocConverter();
            dc.ConvertToPDF(args[0], "output.pdf"); 
        }
    }
}

From the command line, run the program and pass the full path to a source document. The program immediately converts the document and saves a file named output.pdf in the same directory. For example, this:

./ConsoleApp1.exe "c:/temp/sample.txt"

Resulted in the following output:

Image 5

Figure 5: UTF-8 text document with emojis API test with DocConverter from ActivePDF.

The conversion also works in the other direction too.

How to Convert from PDF to HTML as a Developer Using DocConverter APIs

DocConverter can easily convert PDF into any number of image formats or other file types. It can also leverage Scalable Vector Graphics (SVG) to render a pixel-perfect version in HTML. Update the console app to look like this:

var dc = new APDocConverter.DocConverter();
if (args[0].EndsWith(".pdf"))
{
    dc.ConvertFromPDF(args[0], APDocConverter.FromPDFFunction.ToHtmlEx, "output.html");
}
else
{
    dc.ConvertToPDF(args[0], "output.pdf");
}

The application will now auto-detect when a PDF is passed and convert it to HTML. You can pass in complex PDF documents. In this example, the application was called with a sample government W-9 form.

./ConsoleApp1.exe "c:/temp/fw9.pdf"

The resulting HTML renders the form perfectly, as evidenced by this screenshot of the output from a web browser:

Image 6

Figure 6: Convert PDF to HTML test with DocConverter from ActivePDF.

How to Use the DocConverter REST API with HTTP Protocols for PDF Conversion

In addition to the solution that can be deployed on a Windows-based server, DocConverter provides a REST API that enables you to convert documents from any language or platform that supports the HTTP protocol. The endpoint is hosted on the same port as the administration page.

The following program uses the REST API to convert a sample document to PDF. It is written in .NET core and can be run from Linux or macOS in addition to Windows to convert the files (the application just needs access to the DocConverter server).

using System;
using System.IO;
using System.Net;
using System.Net.Http;
using System.Net.Http.Headers;
using System.Threading.Tasks;

namespace ConsoleApp1
{
    class Program
    {
        static void Main(string[] args)
        {
            Task.Run(Run).Wait();
        }

        public static async Task Run()
        {
            using (HttpClient client = new HttpClient())
            {
                client.Timeout = new TimeSpan(0, 15, 0);
                string token = string.Empty;

                //REST API endpoint to convert to PDF with json settings
                //localhost is the Server IP address of where DocConverter is installed
                Uri uri = new Uri("http://localhost:62625/api/DocConverter/Conversion");

                //Multipart content
                MultipartContent content = new MultipartContent();

                //Create MemoryStream of input file to convert to PDF
                FileInfo file1 = new FileInfo(@"c:\temp\sample.txt");
                MemoryStream ms = new MemoryStream();
                using (FileStream fs = file1.OpenRead())
                {
                    fs.CopyTo(ms);
                }
                ms.Seek(0, SeekOrigin.Begin);

                //Create the StreamContent of the input file and add to the MultpartContent
                StreamContent is1 = new StreamContent(ms);
                is1.Headers.ContentType = new MediaTypeHeaderValue("text/plain");
                is1.Headers.ContentDisposition = new ContentDispositionHeaderValue("attachment")
                {
                    Name = "sample",
                    FileName = "sample.txt"
                };
                content.Add(is1);

                //Post the MultipartContent to the endpoint (uri) to convert the input file to PDF using json settings
                HttpResponseMessage hrm = await client.PostAsync(uri, content);
                if (hrm.StatusCode == HttpStatusCode.OK)
                {
                    byte[] bytes = await hrm.Content.ReadAsByteArrayAsync();
                    File.WriteAllBytes(@"c:\temp\Converted.pdf", bytes);
                }
                else
                {
                    //Get the error message if something went wrong
                    string error = await hrm.Content.ReadAsStringAsync();
                    Console.WriteLine(error);
                }

            }
        }
    }
}

The example uses the default settings for local testing and debugging. It is possible to change the configuration to require an authentication token to secure the REST API endpoint.

How to Use the Advanced Features Inside of DocConverter

We have made it clear that DocConverter does more than just convert files. It has the ability to encrypt the output files, add password protection, digitally sign documents, and provide watermarking or "stamping" on the resulting documents. It can also automatically email and/or upload results to an FTP server.

Advanced Features - How to Setup FTP Server Uploads and/or Email with DocConverter

The following code snippet illustrates how to configure a request to upload to an FTP server:

// Setup the FTP request supplying credentials if needed
oDC.AddFTPRequest("127.0.0.1", "/folder");
oDC.SetFTPCredentials("user", "pass");

// Set which files will upload with the FTP request
// To attach a binary file use AddFTPBinaryAttachment
oDC.FTPAttachOutput = true;
oDC.AddFTPAttachment(strPath + "file.txt");

The programmatic model for email attachments is similarly straightforward.

Conclusion

DocConverter is a high-volume document transformation solution that provides multiple methods for automated document conversion to and from PDF. You can automate PDF conversions by simply dropping files into a folder or file share where DocConverter will automatically upload or email your final conversion results, based off the settings that you establish in the configuration manager.

Developers can easily integrate PDF modification capabilities directly into their software applications with minimal code by leveraging the REST API. DocConverter is a solution you can deploy once and scale to manage millions of documents as day.

To get started, visit to get your free trial of DocConverter from ActivePDF now:

https://www.activepdf.com/products/docconverter

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)


Written By
Program Manager Microsoft
United States United States
Note: articles posted here are independently written and do not represent endorsements nor reflect the views of my employer.

I am a Program Manager for .NET Data at Microsoft. I have been building enterprise software with a focus on line of business web applications for more than two decades. I'm the author of several (now historical) technical books including Designing Silverlight Business Applications and Programming the Windows Runtime by Example. I use the Silverlight book everyday! It props up my monitor to the correct ergonomic height. I have delivered hundreds of technical presentations in dozens of countries around the world and love mentoring other developers. I am co-host of the Microsoft Channel 9 "On .NET" show. In my free time, I maintain a 95% plant-based diet, exercise regularly, hike in the Cascades and thrash Beat Saber levels.

I was diagnosed with young onset Parkinson's Disease in February of 2020. I maintain a blog about my personal journey with the disease at https://strengthwithparkinsons.com/.


Comments and Discussions

 
-- There are no messages in this forum --