Click here to Skip to main content
15,914,642 members
Please Sign up or sign in to vote.
1.00/5 (2 votes)
See more:
While Extracting images from PDF file and getting error Parameter not valid

What I have tried:

Hi,
Extracting images from pdf, its has 3 images per page total 500+ pages in pdf.

I have tried

C#
    for (int pageNumber = 4; pageNumber <= pdf.NumberOfPages; pageNumber++)
            {
                pg1 = pageNumber;
             
                PdfDictionary pg = pdf.GetPageN(pageNumber);
                PdfDictionary res = 
    (PdfDictionary)PdfReader.GetPdfObject(pg.Get(PdfName.RESOURCES));
                PdfDictionary xobj = 
    (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.XOBJECT));
                if (xobj != null)
                {
                    foreach (PdfName name in xobj.Keys.OrderByDescending(x => x.IndRef))
                    {
                        try
                        {
                            
                            PdfObject obj = xobj.Get(name);
                            if (obj.IsIndirect())
                            {
                                PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
                                PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
                                if (PdfName.IMAGE.Equals(type))
                                {

                                    int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(System.Globalization.CultureInfo.InvariantCulture));
                                    PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
                                    PdfStream pdfStrem = (PdfStream)pdfObj;
                                    byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)pdfStrem);



                                    if ((bytes != null))
                                    {
                                       
                                        using (System.IO.MemoryStream memStream = new System.IO.MemoryStream(bytes))
                                        {
                                            memStream.Position = 0;
//getting error below line 
 System.Drawing.Image img = System.Drawing.Image.FromStream(memStream);
                                            
                                            if (!Directory.Exists(outputPath + foldername))
                                                Directory.CreateDirectory(outputPath + foldername);
                                            string path = Path.Combine(outputPath + foldername, String.Format(@"{0}.jpg", lstField[i].applicationNo.Trim()));
                                          
                                            System.Drawing.Imaging.EncoderParameters parms = new System.Drawing.Imaging.EncoderParameters(1);
                                            parms.Param[0] = new System.Drawing.Imaging.EncoderParameter(System.Drawing.Imaging.Encoder.Compression, 0);
                                         
                                            System.Drawing.Imaging.ImageCodecInfo jpegEncoder = GetImageEncoder("JPEG");
                                            img.Save(path, jpegEncoder, parms);
                                          
                                        }
                                    }
                                }
                           
                        }

                        catch (Exception ex)
                        {
                            
                        }

                    }

                }
             
            }
Posted
Updated 28-Sep-21 21:32pm
v2

1 solution

Your code is making a lot of assumptions about what constitutes an image in a PDF. PDF images aren't images in the standard sense; instead, they are just arrays of colour values. Why not use PdfImageObject instead? This is designed to read the PDF "images" and cope with that. You can get the underlying image from this using the GetDrawingImage method. An example of doing this might look something like this:
C#
// If null is returned, we can't find or convert an image.
public static Image GetImage(this PdfObject pdfObject)
{
  PdfStream pdfStream = pdfObject as PdfStream;
  if (stream == null) return null;

  PdfObject streamType = stream.Get(PdfName.SUBTYPE);
  if (streamtype == null) return null;

  if (streamType.ToString() == PdfName.IMAGE.ToString())
  {
    PdfImageObject image = new PdfImageObject((PRStream)stream);
    return image?.GetDrawingImage();
  }
  return null;
}
 
Share this answer
 
Comments
Noman Suleman 30-Sep-21 1:04am    
Thanks Let me try this

This content, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)



CodeProject, 20 Bay Street, 11th Floor Toronto, Ontario, Canada M5J 2N8 +1 (416) 849-8900