使用itextsharp将Pdf文件页面转换为图像
本文关键字:转换 图像 文件 itextsharp Pdf 使用 | 更新日期: 2023-09-27 18:30:48
我想使用 ItextSharp lib 转换图像中的 PDF 页面。
知道如何转换图像文件中的每个页面
iText/iTextSharp可以生成和/或修改现有的PDF,但它们不执行任何渲染,而这正是您要找的。我建议查看Ghostscript或其他知道如何实际渲染PDF的库。
您可以使用ImageMagick将pdf转换为图像
转换 -密度 300 "d:''1.pdf" -比例 @1500000 "d:''a.jpg"
和拆分的PDF可以使用itextsharp
这是其他人的代码。
void SplitePDF(string filepath)
{
iTextSharp.text.pdf.PdfReader reader = null;
int currentPage = 1;
int pageCount = 0;
//string filepath_New = filepath + "''PDFDestination''";
System.Text.UTF8Encoding encoding = new System.Text.UTF8Encoding();
//byte[] arrayofPassword = encoding.GetBytes(ExistingFilePassword);
reader = new iTextSharp.text.pdf.PdfReader(filepath);
reader.RemoveUnusedObjects();
pageCount = reader.NumberOfPages;
string ext = System.IO.Path.GetExtension(filepath);
for (int i = 1; i <= pageCount; i++)
{
iTextSharp.text.pdf.PdfReader reader1 = new iTextSharp.text.pdf.PdfReader(filepath);
string outfile = filepath.Replace((System.IO.Path.GetFileName(filepath)), (System.IO.Path.GetFileName(filepath).Replace(".pdf", "") + "_" + i.ToString()) + ext);
reader1.RemoveUnusedObjects();
iTextSharp.text.Document doc = new iTextSharp.text.Document(reader.GetPageSizeWithRotation(currentPage));
iTextSharp.text.pdf.PdfCopy pdfCpy = new iTextSharp.text.pdf.PdfCopy(doc, new System.IO.FileStream(outfile, System.IO.FileMode.Create));
doc.Open();
for (int j = 1; j <= 1; j++)
{
iTextSharp.text.pdf.PdfImportedPage page = pdfCpy.GetImportedPage(reader1, currentPage);
pdfCpy.SetFullCompression();
pdfCpy.AddPage(page);
currentPage += 1;
}
doc.Close();
pdfCpy.Close();
reader1.Close();
reader.Close();
}
}
你可以使用 Ghostscript要将PDF文件转换为图像,我使用以下参数将所需的PDF转换为具有多个帧的tiff图像:
gswin32c.exe -sDEVICE=tiff12nc -dBATCH -r200 -dNOPAUSE -sOutputFile=[Output].tiff [PDF FileName]
您也可以将 -q 参数用于静音模式您可以从此处获取有关其输出设备的更多信息
之后,我可以轻松地加载如下所示的 tiff 帧
using (FileStream stream = new FileStream(@"C:'tEMP'image_$i.tiff", FileMode.Open, FileAccess.Read, FileShare.Read))
{
BitmapDecoder dec = BitmapDecoder.Create(stream, BitmapCreateOptions.IgnoreImageCache, BitmapCacheOption.None);
BitmapEncoder enc = BitmapEncoder.Create(dec.CodecInfo.ContainerFormat);
enc.Frames.Add(dec.Frames[frameIndex]);
}
我用MuPDFCore NuGet做到了。这是我使用的指南链接:https://giorgiobianchini.com/MuPDFCore/MuPDFCore.pdf
using System;
using System.Threading.Tasks;
using MuPDFCore;
using VectSharp.Raster;
MuPDFContext context = new MuPDFContext();
MuPDFDocument document = new MuPDFDocument(context, @"C:'install'test.pdf");
//Renderers: one per page
MuPDFMultiThreadedPageRenderer[] renderers = new MuPDFMultiThreadedPageRenderer[document.Pages.Count];
//Page size: one per page
RoundedSize[] renderedPageSizes = new RoundedSize[document.Pages.Count];
//Boundaries of the tiles that make up each page: one array per page, with one element per thread
RoundedRectangle[][] tileBounds = new RoundedRectangle[document.Pages.Count][];
//Addresses of the memory areas where the image data of the tiles will be stored: one array per page, with one element per thread
IntPtr[][] destinations = new IntPtr[document.Pages.Count][];
//Cycle through the pages in the document to initialise everything
for (int i = 0; i < document.Pages.Count; i++)
{
//Initialise the renderer for the current page, using two threads (total number of threads: number of pages x 2
renderers[i] = document.GetMultiThreadedRenderer(i, 2);
//Determine the boundaries of the page when it is rendered with a 1.5x zoom factor
RoundedRectangle roundedBounds = document.Pages[i].Bounds.Round(2);//quality ..can use 0.5 ,1 etc.
renderedPageSizes[i] = new RoundedSize(roundedBounds.Width, roundedBounds.Height);
//Determine the boundaries of each tile by splitting the total size of the page by the number of threads.
tileBounds[i] = renderedPageSizes[i].Split(renderers[i].ThreadCount);
destinations[i] = new IntPtr[renderers[i].ThreadCount];
for (int j = 0; j < renderers[i].ThreadCount; j++)
{
//Allocate the required memory for the j-th tile of the i-th page.
//Since we will be rendering with a 24-bit-per-pixel format, the required memory in bytes is height x width x 3.
destinations[i][j] = System.Runtime.InteropServices.Marshal.AllocHGlobal(tileBounds[i][j].Height * tileBounds[i][j].Width * 3);
}
}
//Start the actual rendering operations in parallel.
Parallel.For(0, document.Pages.Count, i =>
{
renderers[i].Render(renderedPageSizes[i], document.Pages[i].Bounds, destinations[i], PixelFormats.RGB);
});
//The code in this for-loop is not really part of MuPDFCore - it just shows an example of using VectSharp to "stitch" the tiles up and produce the full image.
for (int i = 0; i < document.Pages.Count; i++)
{
//Create a new (empty) image to hold the whole page.
VectSharp.Page renderedPage = new VectSharp.Page(renderedPageSizes[i].Width,
renderedPageSizes[i].Height);
//Draw each tile onto the image.
for (int j = 0; j < renderers[i].ThreadCount; j++)
{
//Create a raster image object containing the pixel data. Yay, we do not need to copy/marshal anything!
VectSharp.RasterImage tile = new VectSharp.RasterImage(destinations[i][j], tileBounds[i][j].Width,
tileBounds[i][j].Height, false, false);
//Draw the tile on the main image page.
renderedPage.Graphics.DrawRasterImage(tileBounds[i][j].X0, tileBounds[i][j].Y0, tile);
}
//Save the full page as a PNG image.
renderedPage.SaveAsPNG(@"C:'install'page"+ i.ToString() + ".png");
}
//Clean-up code.
for (int i = 0; i < document.Pages.Count; i++)
{
//Release the allocated memory.
for (int j = 0; j < renderers[i].ThreadCount; j++)
{
System.Runtime.InteropServices.Marshal.FreeHGlobal(destinations[i][j]);
}
//Release the renderer (if you skip this, the quiescent renderer’s threads will not be stopped, and your application will never exit!
renderers[i].Dispose();
}
document.Dispose();
context.Dispose();
}
您可以从PDF中提取图像并另存为 JPG这是示例代码你需要 itext Sharp
public IEnumerable<System.Drawing.Image> ExtractImagesFromPDF(string sourcePdf)
{
// NOTE: This will only get the first image it finds per page.
var pdf = new PdfReader(sourcePdf);
var raf = new RandomAccessFileOrArray(sourcePdf);
try
{
for (int pageNum = 1; pageNum <= pdf.NumberOfPages; pageNum++)
{
PdfDictionary pg = pdf.GetPageN(pageNum);
// recursively search pages, forms and groups for images.
PdfObject obj = ExtractImagesFromPDF_FindImageInPDFDictionary(pg);
if (obj != null)
{
int XrefIndex = Convert.ToInt32(((PRIndirectReference)obj).Number.ToString(CultureInfo.InvariantCulture));
PdfObject pdfObj = pdf.GetPdfObject(XrefIndex);
PdfStream pdfStrem = (PdfStream)pdfObj;
PdfImageObject pdfImage = new PdfImageObject((PRStream)pdfStrem);
System.Drawing.Image img = pdfImage.GetDrawingImage();
yield return img;
}
}
}
finally
{
pdf.Close();
raf.Close();
}
}