PDF 文件已损坏,在将内存流移动到文件流时无法修复

本文关键字:文件 移动 内存 PDF 已损坏 | 更新日期: 2023-09-27 18:36:28

我正在使用带有 VB.Net 的iTextSharp将图像印在PDF文档上。 (由于这不是特定于语言的,我也为 C# 标记了。 我有两个应用程序使用该过程。

  • 第一个使用内存流中的字节来显示 PDF文件在线。 这件作品正在工作。

  • 第二个使用相同的函数,但将 PDF 保存到文件。 此片段生成无效的 PDF。

我见过一些类似的问题,但它们最初都在创建一个文档,并且在代码中有一个文档对象。他们的内存流从一开始就被破坏了。我的代码没有文档对象,我的原始内存流打开正常。

这是我收到错误的地方:(我必须将 m 中的缓冲区放入新的内存流中,因为 fillPDF 函数中的压印器默认关闭流,除非另有标记。

Dim m As MemoryStream = PDFHelper.fillPDF(filename, Nothing, markers, "")
Dim m2 As New MemoryStream(m.GetBuffer, 0, m.GetBuffer.Length)
Dim f As FileStream = New FileStream("C:'temp.pdf", FileMode.Create)
m2.CopyTo(f, m.GetBuffer.Length)
m2.Close()
f.Close()

这是我在网站上成功使用它的方法之一。 这个不使用图像,尽管其他一些类似的成功地方确实在多个文档上使用图像,然后合并在一起。

Dim m As System.IO.MemoryStream = PDFHelper.fillPDF(filename, New Dictionary(Of String, String), New List(Of PDFHelper.PDfImage), "SAMPLE")
Dim data As Byte() = m.GetBuffer
Response.Clear()
//Send the file to the output stream
Response.Buffer = True
//Try and ensure the browser always opens the file and doesn’t just prompt to “open/save”.
Response.AddHeader("Content-Length", data.Length.ToString())
Response.AddHeader("Content-Disposition", "inline; filename=" + "Sample")
Response.AddHeader("Expires", "0")
Response.AddHeader("Pragma", "cache")
Response.AddHeader("Cache-Control", "private")
//Set the output stream to the correct content type (PDF).
Response.ContentType = "application/pdf"
Response.AddHeader("Accept-Ranges", "bytes")
//Output the file
Response.BinaryWrite(data)
//Flushing the Response to display the serialized data to the client browser.
Response.Flush()
Try
    Response.End()
Catch ex As Exception
    Throw ex
End Try

这是我的实用程序类(PDFHelper.fillPDF)中的函数

  Public Shared Function fillPDF(fileToFill As String, Optional fieldValues As Dictionary(Of String, String) = Nothing, Optional images As List(Of PDfImage) = Nothing, Optional watermarkText As String = "") As MemoryStream
        Dim m As MemoryStream = New MemoryStream() // for storing the pdf
        Dim reader As PdfReader = New PdfReader(fileToFill) // for reading the document
        Dim outStamper As PdfStamper = New PdfStamper(reader, m) //for filling the document
        If fieldValues IsNot Nothing Then
            For Each kvp As KeyValuePair(Of String, String) In fieldValues
                outStamper.AcroFields.SetField(kvp.Key, kvp.Value)
            Next
        End If

        If images IsNot Nothing AndAlso images.Count > 0 Then //add all the images
            For Each PDfImage In images
                Dim img As iTextSharp.text.Image = Nothing //image to stamp
                //set up the image (different for different cases
                Select Case PDfImage.ImageType
                    //removed for brevity
                End Select
                Dim overContent As PdfContentByte = outStamper.GetOverContent(PDfImage.PageNumber) // specify page number for stamping
                overContent.AddImage(img)
            Next
        End If
        //add the water mark
        If watermarkText <> "" Then
            Dim underContent As iTextSharp.text.pdf.PdfContentByte = Nothing
            Dim watermarkRect As iTextSharp.text.Rectangle = reader.GetPageSizeWithRotation(1)
          //removed for brevity
        End If
        //flatten and close out
        outStamper.FormFlattening = True
        outStamper.SetFullCompression()
        outStamper.Close()
        reader.Close()
        Return m

PDF 文件已损坏,在将内存流移动到文件流时无法修复

由于您的代码正在流式传输 PDF,因此解决问题的一种简单方法是对 fillPDF 方法进行少量更改 - 让它返回一个字节数组:

// other parameters left out for simplicity sake  
public static byte[] fillPDF(string resource) {
  PdfReader reader = new PdfReader(resource);
  using (var ms = new MemoryStream()) {
    using (PdfStamper stamper = new PdfStamper(reader, ms)) {
      // do whatever you need to do
    }
    return ms.ToArray();
  }      
}

然后,您可以将字节数组流式传输到客户端 ASP.NET 并将其保存到文件系统:

// get the manipulated PDF    
byte[] myPdf = fillPDF(inputFile);
// stream via ASP.NET
Response.BinaryWrite(myPdf);
// save to file system
File.WriteAllBytes(outputFile, myPdf);

如果您从标准 ASP.NET Web 表单生成 PDF,请不要忘记在写入 PDF 后调用 Response.End(),否则字节数组末尾将附加 HTML 标记垃圾。

这会将现有的PDF拷贝到MemoryStream中,然后将其保存到磁盘。也许您可以调整它来解决您的问题?

  Private Sub Button1_Click(sender As System.Object, e As System.EventArgs) Handles Button1.Click
    Dim strInputFilename As String = "C:'Junk'Junk.pdf"
    Dim strOutputFilename As String = "C:'Junk'Junk2.pdf"
    Dim byt() As Byte
    Using ms As New MemoryStream
      '1. Load PDF into memory stream'
      Using bw As New BinaryWriter(ms)
        Using fsi As New FileStream(strInputFilename, FileMode.Open)
          Using br As New BinaryReader(fsi)
            Try
              Do
                bw.Write(br.ReadByte())
              Loop
            Catch ex As EndOfStreamException
            End Try
          End Using
        End Using
      End Using
      byt = ms.ToArray()
    End Using
    '2. Write memory copy of PDF back to disk'
    My.Computer.FileSystem.WriteAllBytes(strOutputFilename, byt, False)
    Process.Start(strOutputFilename)
  End Sub