Add Images and Textboxes to PDF

pmpdesign

5.00/5 (26 votes)

May 2, 2007

CPOL

15 min read

267758

5347

A lightweight C# library to add images and 'round rectangles' to a PDF on the fly and then securely embed the PDF in a web page

Download source and examples - 44.0 KB

Introduction

Screenshot - screenshot.jpg

I needed a method to create PDF documents on the fly, specifically invoices and similar financial documents.

This software also had to integrate seamlessly into an application I was developing.

Obviously there are a lot of pieces of software ranging from open source to fairly expensive commercial applications available, however, I wanted something that I could incorporate into a commercial application with minimum or no license fees and no issues with support if I did not have access to source code.

Another criteria was to have minimum code. I saw no reason to have 3MB of code when I probably only needed 5% of it.

After searching the web for a long time, I finally came across an article on CodeProject (PDF Library for creating PDF with tables and text, in C#). This excellent article by Zainu introduced me to the concepts behind creating a PDF.

Zainus' article introduces the basic concepts required to create a PDF structure and add text either as a one-line/sentence or formatted into a tabular fashion.

With Zainu's permission, I have extended his codebase to include the addition of JPG images, to add textboxes in the form of 'rounded rectangles' as seen in the image to the right and finally, to display the finished PDF inside a web page rather than linking to it with the usual <a href="abc.pdf">.

Background

In order to fully understand the code behind this library, you should read the article by Zainu as I will not cover the same topics again here.

Some of the original code has been modified, these changes are all commented in the code itself.

The workings of a PDF

Although I am not going to revisit the original article concepts, I will revisit the concept behind the PDF to show what is required to add images and 'round rectangles'.

This is an example of the PDF markup that will be generated if you download and run the attached code.

Don't forget that even though you can read the markup in a text editor, that a PDF is in fact a binary file and (excluding the simplest case) must be treated as such.

You can find a much more detailed explanation of the PDF by downloading the Adobe PDF Manual. It is only 1300 pages or so....

PDF Markup	What it means
%PDF-1.5 %??	This is a PDF version 1.5 - the double question mark is simply so that FTP and similar packages know that this is a binary file when transferring.
8 0 obj << /Type /Page/Parent 2 0 R /Rotate 0 /MediaBox [0 0 595 842]/CropBox [0 0 595 842] /Resources<</ProcSet[/PDF/Text] /Font<</T1 3 0 R/T2 4 0 R/T3 5 0 R/T4 6 0 R>> /XObject <</I1 10 0 R >>>> /Contents 9 0 R >> endobj	The 'X 0 obj' means that this is an object in the PDF - the X is its unique number. This object (8 0) describes one page, setting the page size and then defining the resources that will be needed, in this case the fonts called T1, T2, T3 and T4 which are described in the objects 3 0, 4 0, 5 0 and 6 0 respectively. The XObject refers in this case to an image called I1, the data that describes it can be found in the object 10 0. Finally, the 'Contents' (essentially the markup which tells the PDF what to display) can be found in the object numbered 9 0. The 'Parent' reference is to object 2 0 which records the number of pages in the entire document (in this case only 1) and shows which object describes the contents of each page.
9 0 obj<</Length 989 >>stream q 144 0 0 100 300 700 cm 1 0 0 1 0 0 cm /I1 Do Q BT/T3 12 Tf 105 699 Td (Round Rectangle Header) Tj ET endstream endobj	This object contains markup to describe the actual page and means such things as place the text "XYZ" at location x,y in font Z or draw an image located at x,y of size w,h etc In the code itself, is `SetStream`.
1 0 obj<</Type /Catalog/Lang(EN-US)/Pages 2 0 R>> endobj	Root of the file, says that the index (or pagetree) to the document can be found in object 2 0
2 0 obj<</Count 1/Kids [ 8 0 R ]>> endobj	The document index - this document has one page (Kids) and the information can be found in object 8 0
3 0 obj<</Type/Font/Name /T1/BaseFont/Times-Roman /Subtype/Type1/Encoding /WinAnsiEncoding>> endobj 4 0 obj<</Type/Font/Name /T2/BaseFont/Times-Italic /Subtype/Type1/Encoding /WinAnsiEncoding>> endobj 5 0 obj<</Type/Font/Name /T3/BaseFont/Times-Bold /Subtype/Type1/Encoding /WinAnsiEncoding>> endobj 6 0 obj<</Type/Font/Name /T4/BaseFont/Courier /Subtype/Type1/Encoding /WinAnsiEncoding>> endobj	Describes the fonts used in the document.
10 0 obj <</Name /I1 /Type /XObject /Subtype /Image /Width 144 /Height 100 /Length 29779 /Filter /DCTDecode /ColorSpace /DeviceRGB /BitsPerComponent 8 >> stream [ byte data to represent the jpg image ] endstream endobj	Describes an image. Note that the actual byte data is missing - you will find out how to add that later.
7 0 obj<</ModDate(D:20070501024237+10'00') /CreationDate(D:20070501024237+10'00') /Title(Title)/Creator(Your App Name) /Author(System Generated /Producer(www.My New App.com.au)/Company(My Company Name)>> endobj	Properties of the document e.g. who created it and when etc
xref 0 11 0000000000 65535 f 0000001275 00000 n 0000001332 00000 n 0000001374 00000 n 0000001473 00000 n 0000001573 00000 n 0000001671 00000 n 0000031745 00000 n 0000000014 00000 n 0000000234 00000 n 0000001766 00000 n	The byte offsets of each object in the document. This is explained in the original article by Zainu.
trailer <</Size 11 /Root 1 0 R /Info 7 0 R /ID[<5181383ede94727bcb32ac27ded71c68> <5181383ede94727bcb32ac27ded71c68>] >>	'Root' refers to the starting point known as the pagetree (object 1 0 in this case) of the document.
startxref 31959 %%EOF	End of the file.

Using the code

Download the zip file above and extract to a suitable location (or create a new web application in Visual Studio). The zip file contains six files

PDFLibrary.cs
Default.aspx
Default.aspx.cs
streampdf.aspx
streampdf.aspx.cs
myimage.jpg

The file PDFLibrary.cs should be placed in the App_Code folder, the rest in the root of the application.

Point your browser at the default.aspx file and you should get a button displayed. Clicking this button should create the PDF and display it within the web page.

How it all works

Screenshot - bezier.jpg

Let's take a look firstly at the 'round rectangles'.

These are based on 'Cubic Bezier Curves'. If you have ever used PhotoShop or similar graphical software, you may have used this method without even knowing it.

Essentially, all we do to create the 'round rectangle' is to use eight paths to form an area. Four of these paths are the radii based on the Bezier Curve plus four straight lines which connect them. This is then stroked to form the border, and the bounded area is then coloured in to form the background. A simple rectangle is then drawn on top of the bezier area to form the text box.

If you are interested in the full details, have a look in the Adobe PDF Manual which describes the mathematics behind it. For the rest of us, all we need to know is that it works!

Let's have a look at the actual code now. First we create a new object to represent the rectangle in code

RoundRectangle rr = new RoundRectangle();

and then specify the colours for the border, main background and background colour for the textbox.

ColorSpec rrBorder = new ColorSpec(0, 0, 0);        //main border colour
ColorSpec rrMainBG = new ColorSpec(204, 204, 204);  //background colour of the 
                                                    //round rectangle
ColorSpec rrTBBG = new ColorSpec(255, 255, 255);    //background colour of the 
                                                    //rectangle on top of the 
                                                    //round rectangle

Finally, as this is only markup as far as the PDF is concerned, we add the markup to the PDF content stream.

content.SetStream("q\r\n");          //initialise the PDF graphics cursor
content.SetStream(rr.DrawRoundRectangle(45, 582, 240, 130, 20, 0.55, 20, 90, 
    1, rrBorder, rrMainBG, rrTBBG));   //Draw the rectangle
content.SetStream("Q\r\n");         //close the graphics cursor in PDF

There are twelve parameters for the method DrawRoundRectangle

    LLX
    LLY
    rrWidth
    rrHeight
    CornerRadius
    Circularity
    HeaderHeight
    TextBoxHeight
    Border
    BorderColor
    MainBG
    TextBoxBG

Screenshot - figure2.jpg

LLX and LLY are the horizontal and vertical coordinates of the lower left of the box, rrWidth and rrHeight are the width and height of the box (remember all coordinates are in 1/72" rather than pixels).

The CornerRadius parameter is as shown in Figure 2. The HeaderHeight parameter is the vertical height of the area at the top where you can later place text. It cannot be less than the radius otherwise the text area rectangle placed over the top will overlap.

TextBoxHeight is the height of the text box and will be centred vertically. The last three colour parameters are the three ColorSpec values we created earlier.

Finally the Circularity parameter. This is used to change the actual shape of the corners of the box.

As I wanted to make each corner mirror reflections of each other, I decided to calculate the (x2,y2) and (x3,y3) values shown in Figure 1 (which are the values in PDF markup to describe the curve) based on the radius of the corner and a constant which I called Circularity. The value for (x1,y1) is the current graphics cursor position in the PDF and the (x4,y4) value is the end point of the curve and also the new graphics cursor position.

At a value of 0, you get a straight line (in effect, an octagonal shape). If you increase the value to 0.55, you get a perfect radius. As the value increases towards 1, the corner gets tighter / smaller. Once the value starts to go above 1, some other interesting corner shapes start to form.

So that's all there is to it. This code assumes that the final document is one page and has a fixed number of lines of text in a text box, however, it would not be too hard to combine the textAndtable.AddRow method with the DrawRoundRectangle method to dynamically create the vertical dimensions of the textbox and wrap it across multiple pages if you needed to.

Drawing the straight lines

Drawing the lines inside a box (perhaps to designate columns) can be done using the textAndtable class if the text is tabular, or you can use the line.DrawLine method. This simply accepts the start of line (xs,ys), end of line (xe,ye) coordinates plus the line width and colour and adds the markup to draw a line to the PDF content stream. This is useful for separating individual text elements.

Adding an image to the PDF document

Displaying an image on a PDF page is a much more involved process than creating a 'round rectangle'. As an image cannot be mathematically specified, we need to provide the PDF with more information before it can be rendered.

There are three parts to adding an image to a PDF. These are shown in the table describing a simple PDF markup at the start of this article.

Create the index
Create the parameters and byte data that describes the image
Draw the image to the document

Let's look in more detail at what this involves.

PDF Markup	What it means as far as adding an image is concerned
8 0 obj << /Type /Page/Parent 2 0 R /Rotate 0 /MediaBox [0 0 595 842]/CropBox [0 0 595 842] /Resources<</ProcSet [/PDF/Text] /Font<</T1 3 0 R/T2 4 0 R/T3 5 0 R/T4 6 0 R>> `/XObject <</I1 10 0 R >>`>> /Contents 9 0 R >> endobj	Firstly we need to tell the PDF where to find the data that describes the image. This is the `CreateImageDict` method. In this case we are telling the PDF that that the data that describes the image called 'I1' can be found in the object numbered 10 0.
9 0 obj<</Length 989 >>stream`q 144 0 0 100 300 700 cm 1 0 0 1 0 0 cm /I1 Do Q` BT/T3 12 Tf 105 699 Td (Round Rectangle Header) Tj ET endstream endobj	Now we need to give the PDF some information as to where on the page to place the image. The q & Q mean we are working with a graphics cursor (see PDF manual for full details of syntax). The next three lines describe where to place the image relative to the page, its page width and height. If you look in the PDF manual, there is also a whole host of transformations that you can apply to an image, for example rotation, scaling, skewing and many other more advanced features. These would need to be added to the code if you wished to use them. This markup is added to the content stream using the `AddImageResource` method as a part of the `GetPageDict` method.
`10 0 obj <</Name /I1 /Type /XObject /Subtype /Image /Width 144 /Height 100 /Length 29779 /Filter /DCTDecode /ColorSpace /DeviceRGB /BitsPerComponent 8 >> stream [ byte data to represent the jpg image ] endstream endobj`	Finally we need to describe the actual image. The PDF needs details such as the name, pixel dimensions, data compression type (jpg, gif, png, tif etc all have different compression methods), colour space eg RGB or CMYK etc plus the number of bits required to describe each pixel colour component and finally the byte data that makes up the image.

There are only a few lines of code to generate the PDF.

String ImagePath = Server.MapPath("myimage.jpg"); //file path to image source
ImageDict I1 = new ImageDict();                   //new image dictionary object
I1.CreateImageDict("I1", ImagePath); //create the object which describes 
                                     //the image    
page.AddImageResource(I1.PDFImageName, I1, content.objectNum); //which object 
                                     //within the PDF contains the image data
PageImages pi = new PageImages();
content.SetStream(pi.ShowImage("I1", 300, 700, 144, 100));     //draw an image 
                                     //called 'I1', where and what size

Once we have the created the data, we need to write it to the physical PDF file

file.Write(I1.GetImageDict(file.Length, out size), 0, size);

The code behind adding an image

First we need to define where the image is on the file system.

Secondly we need to add this data to the PDF in the form of an object (9 0) in the table above. This is perhaps the most complex part of the process. Luckily for me, Zainu had already done most of the hard work as far as creating a framework which keeps track of object numbers and the other main parts of a PDF. I have simply added in some more methods specifically to handle images.

In order to create the object which contains the data for the image, we must remember that a PDF file is in fact binary by nature. The markup is created as unicode (16 bit), whereas the actual data representing the image is only 8 bit in the case of my example. This means that we have to handle the byte output slightly differently to create the object bytes.

In essence we do this in three parts. Part one is send the first part of the object (obj X 0 .... stream) converted to byte data imageDictStart followed by the actual byte data of the image imagebytes followed by the last part of the object (endstream endobj) imageDictEnd to the PDF stream.

This is coded in the methods, GetImageDict and GetImageBytes

CreateImageDict opens the jpg as a bitmap to get the pixel dimensions and then puts the byte data into an array. Finally it adds the parameters such as the image name, pixel dimensions etc into the string imageDictStart ready for writing to the page later.

The next thing we need to do is to add the reference to this object into the page index. This is the markup /XObject <</I1 10 0 R >> in the example above.

This is written in AddImageResource to the string imageRef which is later used by GetPageDict to create the PDF page index.

Now all the hard work is done, we just need to let the PDF know that we would like to display the image on the page. This is done using markup such as

    q
    144 0 0 100 300 700 cm
    1 0 0 1 0 0 cm
    /I1 Do
    Q

and is added to the PDF using PageImages.ShowImage

    content.SetStream(pi.ShowImage("I1", 300, 700, 144, 100));

This could be written direct to the content stream, however, I implemented it as a separate class in case I wanted to add in some transformations to the image later.

The first parameter is the image name, the second pair are the (x,y) coordinates where the lower left of the image should be placed and the last pair are the width and height of the image on the page.

You should now have an image added to your PDF.

Other image types

I have only tested this on RGB JPGs. Other image types have different compression methods and hence different decompression methods, different colour spaces and different bit component levels. If you want to try the method with different image types, you could start by changing the decode method in CreateImageDict. I think you can find the required information in the PDF manual. Feel free to post your findings in the discussion below.

Displaying the image as part of a Web Application

Now that we have a PDF, we probably want to show it somewhere.

In many cases, we can simply do this by using a hyperlink to the file itself. This will be fine for many applications, but what if we want to restrict the file to certain users of a system?

You could set up file permissions on a network, but this is not an option on a public website (for instance).

I decided to use a very simple technique using an <iframe>. This can easily be used to display an object inside a web page simply by providing the file name and the details of the application that will open the file (in this case application/PDF).

However, there is another way to use this. Instead of specifying the actual pdf file, we can specify an .aspx file which will serve the byte data of the file. In this case, we can now determine who the requestor is and determine if they have permission to view the file before serving it.

If you look towards the bottom of default.aspx, you will see that the <iframe> calls streampdf.aspx for the file data source.

Have a look at the structure of the html that streampdf.asp generates. It does not send any headers etc, only the application type followed by the bytes of the file specified.

This version simply sends the bytes from the hardcoded file name, however, you could specify a reference to perhaps a primary key in a database which contains the actual file name/path to serve. You could even store the binary data of the PDF in the database itself if you wished. In this way you can control who sees the file.

Note that .NET 2.0 web apps have a special folder called App_Data. This is specifically designed for storage of such files as anyone browsing to a file in this folder will be returned a message that The system cannot find the file specified.

Further development?

I have tried to use flate compression to reduce the size of the page dictionary, so far unsuccessfully. I gather that the MS implementation of the flate compression algorithm is not the same as the Adobe version. If anyone manages to work out how to use it, please post it!

The ability to use other image formats (gif, png, tif etc) would also be a bonus. If anyone manages to add that in successfully, please post it here.

If you would like to use this code...

In the spirit of CodeProject - go ahead and use the code, there are no licensing conditions. Just remember the code is provided 'as is' and if it breaks your application I take no responsibility. If you successfully use the code in an application, I would appreciate a mention or just send me an email and let me know it works!

History

May 2007 - Article first published.