使用itextsharp将pdf拆分为更小的pdf'；s基于大小

本文关键字：pdf 于大小 itextsharp 拆分使用 | 更新日期: 2023-09-27 18:21:47

因此，我们有一些非常低效的代码，可以根据允许的最大大小将pdf分割成更小的块。又称作如果最大大小为10megs，则会跳过一个8meg的文件，而会根据页数拆分一个16meg的文件。

这是我继承的代码，我觉得必须有一种更有效的方法来做到这一点，只需要一个方法和更少的对象实例化。

我们使用以下代码来调用方法：

        List<int> splitPoints = null;
        List<byte[]> documents = null;
        splitPoints = this.GetPDFSplitPoints(currentDocument, maxSize);
        documents = this.SplitPDF(currentDocument, maxSize, splitPoints);

方法：

    private List<int> GetPDFSplitPoints(IClaimDocument currentDocument, int maxSize)
    {
        List<int> splitPoints = new List<int>();
        PdfReader reader = null;
        Document document = null;
        int pagesRemaining = currentDocument.Pages;
        while (pagesRemaining > 0)
        {
            reader = new PdfReader(currentDocument.Data);
            document = new Document(reader.GetPageSizeWithRotation(1));
            using (MemoryStream ms = new MemoryStream())
            {
                PdfCopy copy = new PdfCopy(document, ms);
                PdfImportedPage page = null;
                document.Open();
                //Add pages until we run out from the original
                for (int i = 0; i < currentDocument.Pages; i++)
                {
                    int currentPage = currentDocument.Pages - (pagesRemaining - 1);
                    if (pagesRemaining == 0)
                    {
                        //The whole document has bee traversed
                        break;
                    }
                    page = copy.GetImportedPage(reader, currentPage);
                    copy.AddPage(page);
                    //If the current collection of pages exceeds the maximum size, we save off the index and start again
                    if (copy.CurrentDocumentSize > maxSize)
                    {
                        if (i == 0)
                        {
                            //One page is greater than the maximum size
                            throw new Exception("one page is greater than the maximum size and cannot be processed");
                        }
                        //We have gone one page too far, save this split index   
                        splitPoints.Add(currentDocument.Pages - (pagesRemaining - 1));
                        break;
                    }
                    else
                    {
                        pagesRemaining--;
                    }
                }
                page = null;
                document.Close();
                document.Dispose();
                copy.Close();
                copy.Dispose();
                copy = null;
            }
        }
        if (reader != null)
        {
            reader.Close();
            reader = null;
        }
        document = null;
        return splitPoints;
    }
    private List<byte[]> SplitPDF(IClaimDocument currentDocument, int maxSize, List<int> splitPoints)
    {
        var documents = new List<byte[]>();
        PdfReader reader = null;
        Document document = null;
        MemoryStream fs = null;
        int pagesRemaining = currentDocument.Pages;
        while (pagesRemaining > 0)
        {
            reader = new PdfReader(currentDocument.Data);
            document = new Document(reader.GetPageSizeWithRotation(1));
            fs = new MemoryStream();
            PdfCopy copy = new PdfCopy(document, fs);
            PdfImportedPage page = null;
            document.Open();
            //Add pages until we run out from the original
            for (int i = 0; i <= currentDocument.Pages; i++)
            {
                int currentPage = currentDocument.Pages - (pagesRemaining - 1);
                if (pagesRemaining == 0)
                {
                    //We have traversed all pages
                    //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
                    fs.Flush();
                    copy.Close();
                    documents.Add(fs.ToArray());
                    document.Close();
                    fs.Dispose();
                    break;
                }
                page = copy.GetImportedPage(reader, currentPage);
                copy.AddPage(page);
                pagesRemaining--;
                if (splitPoints.Contains(currentPage + 1) == true)
                {
                    //Need to start a new document
                    //The call to copy.Close() MUST come before using fs.ToArray() because copy.Close() finalizes the document
                    fs.Flush();
                    copy.Close();
                    documents.Add(fs.ToArray());
                    document.Close();
                    fs.Dispose();
                    break;
                }
            }
            copy = null;
            page = null;
            fs.Dispose();
        }
        if (reader != null)
        {
            reader.Close();
            reader = null;
        }
        if (document != null)
        {
            document.Close();
            document.Dispose();
            document = null;
        }
        if (fs != null)
        {
            fs.Close();
            fs.Dispose();
            fs = null;
        }
        return documents;
    }

据我所知，我能在网上看到的唯一代码是VB，不一定能解决大小问题。

更新：

我们遇到了内存外的异常，我认为这是大对象堆的问题。因此，一个想法是减少代码占用，这可能会减少堆上大型对象的数量。

基本上，这是一个循环的一部分，该循环遍历任意数量的PDF，然后将它们拆分并存储在数据库中。现在，我们不得不将方法从一次完成所有这些操作（上次运行的是97个不同大小的pdf）改为每5分钟在系统中运行5个pdf。这并不理想，当我们将该工具扩展到更多客户时，也不会很好地扩展。

（我们正在处理50-100 meg的pdf，但它们可能更大）。

使用itextsharp将pdf拆分为更小的pdf'；s基于大小

我也继承了这个确切的代码，其中似乎有一个主要缺陷。在GetPDFSplitPoints方法中，它根据maxsize检查复制页面的总大小，以确定在哪个页面拆分文件
在SplitPDF方法中，当它到达发生拆分的页面时，请确保此时的MemoryStream低于允许的最大大小，如果再多出一个页面，它就会超过限制。但在执行了document.Close();之后，会向MemoryStream添加更多内容（在我使用的一个示例PDF中，MemoryStream的Length在document.Close之前和之后从9MB增加到19MB）。我的理解是，复制页面的所有必要资源都添加到Close上
我想我必须完全重写这段代码，以确保在保持原始页面完整性的同时不会超过最大大小。