将多个word文档合并到一个Open Xml中
本文关键字:一个 Open Xml word 文档 合并 | 更新日期: 2023-09-27 18:28:29
我有大约10个单词的文档,这些文档是我使用开放xml和其他东西生成的。现在我想创建另一个word文档,并将它们一个接一个地加入到这个新创建的文档中。我希望使用开放的xml,任何提示都是值得注意的。以下是我的代码:
private void CreateSampleWordDocument()
{
//string sourceFile = Path.Combine("D:''GeneralLetter.dot");
//string destinationFile = Path.Combine("D:''New.doc");
string sourceFile = Path.Combine("D:''GeneralWelcomeLetter.docx");
string destinationFile = Path.Combine("D:''New.docx");
try
{
// Create a copy of the template file and open the copy
//File.Copy(sourceFile, destinationFile, true);
using (WordprocessingDocument document = WordprocessingDocument.Open(destinationFile, true))
{
// Change the document type to Document
document.ChangeDocumentType(DocumentFormat.OpenXml.WordprocessingDocumentType.Document);
//Get the Main Part of the document
MainDocumentPart mainPart = document.MainDocumentPart;
mainPart.Document.Save();
}
}
catch
{
}
}
更新(使用AltChunks):
using (WordprocessingDocument myDoc = WordprocessingDocument.Open("D:''Test.docx", true))
{
string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2) ;
MainDocumentPart mainPart = myDoc.MainDocumentPart;
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(
AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open("D:''Test1.docx", FileMode.Open))
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
当我使用多个文件时,为什么这个代码会覆盖最后一个文件的内容?更新2:
using (WordprocessingDocument myDoc = WordprocessingDocument.Open("D:''Test.docx", true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 3);
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
using (FileStream fileStream = File.Open("d:''Test1.docx", FileMode.Open))
{
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body
.Elements<Paragraph>().Last());
mainPart.Document.Save();
}
using (FileStream fileStream = File.Open("d:''Test2.docx", FileMode.Open))
{
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body
.Elements<Paragraph>().Last());
}
using (FileStream fileStream = File.Open("d:''Test3.docx", FileMode.Open))
{
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
mainPart.Document
.Body
.InsertAfter(altChunk, mainPart.Document.Body
.Elements<Paragraph>().Last());
}
}
这段代码两次附加Test2数据,以代替Test1数据。意味着我得到:
Test
Test2
Test2
而不是:
Test
Test1
Test2
仅使用openXMLSDK,就可以使用AltChunk
元素将多个文档合并为一个文档。
这个链接提供了组装多个单词文档的简单方法,而这个如何使用altChunk进行文档组装提供了一些示例。
编辑1
基于您在更新的问题(更新#1)中使用altchunk
的代码,以下是我测试过的VB.Net代码,它对我来说很有魅力:
Using myDoc = DocumentFormat.OpenXml.Packaging.WordprocessingDocument.Open("D:''Test.docx", True)
Dim altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2)
Dim mainPart = myDoc.MainDocumentPart
Dim chunk = mainPart.AddAlternativeFormatImportPart(
DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML, altChunkId)
Using fileStream As IO.FileStream = IO.File.Open("D:''Test1.docx", IO.FileMode.Open)
chunk.FeedData(fileStream)
End Using
Dim altChunk = New DocumentFormat.OpenXml.Wordprocessing.AltChunk()
altChunk.Id = altChunkId
mainPart.Document.Body.InsertAfter(altChunk, mainPart.Document.Body.Elements(Of DocumentFormat.OpenXml.Wordprocessing.Paragraph).Last())
mainPart.Document.Save()
End Using
编辑2
第二个问题(更新#2)
此代码将Test2数据追加两次,以代替Test1数据作为好
与CCD_ 3有关。
对于要合并到主文档中的每个文档,您需要:
- 在
mainDocumentPart
中添加一个AlternativeFormatImportPart
,其中Id
必须是唯一的此元素包含插入的数据 - 在主体中添加一个
Altchunk
元素,在该元素中将id
设置为引用以前的AlternativeFormatImportPart
在您的代码中,所有AltChunks
都使用相同的Id。这就是为什么你多次看到相同的文本。
我不确定altchunkid是否与您的代码唯一:string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString().Substring(0, 2);
如果您不需要设置特定的值,我建议您在添加AlternativeFormatImportPart
时不要显式设置AltChunkId
。相反,你会得到这样一个由SDK生成的:
VB.Net
Dim chunk As AlternativeFormatImportPart = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML)
Dim altchunkid As String = mainPart.GetIdOfPart(chunk)
C#
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(DocumentFormat.OpenXml.Packaging.AlternativeFormatImportPartType.WordprocessingML);
string altchunkid = mainPart.GetIdOfPart(chunk);
有一个很好的包装器API(Document Builder 2.2)围绕开放式xml,专门设计用于合并文档,可以灵活选择要合并的段落等。您可以从这里下载(更新:移动到github)。
这里有关于如何使用它的文档和屏幕截图。
更新:代码示例
var sources = new List<Source>();
//Document Streams (File Streams) of the documents to be merged.
foreach (var stream in documentstreams)
{
var tempms = new MemoryStream();
stream.CopyTo(tempms);
sources.Add(new Source(new WmlDocument(stream.Length.ToString(), tempms), true));
}
var mergedDoc = DocumentBuilder.BuildDocument(sources);
mergedDoc.SaveAs(@"C:'TargetFilePath");
类型Source
和WmlDocument
来自Document Builder API。
您甚至可以直接添加文件路径,如果您选择作为:
sources.Add(new Source(new WmlDocument(@"C:'FileToBeMerged1.docx"));
sources.Add(new Source(new WmlDocument(@"C:'FileToBeMerged2.docx"));
发现AltChunk
和Document Builder
合并文档的方法之间的这种很好的比较有助于根据需求进行选择。
您也可以使用DocX库来合并文档,但我更喜欢使用Document Builder来合并文档。
希望这能有所帮助。
这些答案中唯一缺少的是for
循环。
对于那些只想复制/粘贴它的人:
void MergeInNewFile(string resultFile, IList<string> filenames)
{
using (WordprocessingDocument document = WordprocessingDocument.Create(resultFile, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = document.AddMainDocumentPart();
mainPart.Document = new Document(new Body());
foreach (string filename in filenames)
{
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML);
string altChunkId = mainPart.GetIdOfPart(chunk);
using (FileStream fileStream = File.Open(filename, FileMode.Open))
{
chunk.FeedData(fileStream);
}
AltChunk altChunk = new AltChunk { Id = altChunkId };
mainPart.Document.Body.AppendChild(altChunk);
}
mainPart.Document.Save();
}
}
所有学分归Chris和yonexbat 所有
易于在C#中使用:
using System;
using System.IO;
using System.Linq;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
namespace WordMergeProject
{
public class Program
{
private static void Main(string[] args)
{
byte[] word1 = File.ReadAllBytes(@"..'..'word1.docx");
byte[] word2 = File.ReadAllBytes(@"..'..'word2.docx");
byte[] result = Merge(word1, word2);
File.WriteAllBytes(@"..'..'word3.docx", result);
}
private static byte[] Merge(byte[] dest, byte[] src)
{
string altChunkId = "AltChunkId" + DateTime.Now.Ticks.ToString();
var memoryStreamDest = new MemoryStream();
memoryStreamDest.Write(dest, 0, dest.Length);
memoryStreamDest.Seek(0, SeekOrigin.Begin);
var memoryStreamSrc = new MemoryStream(src);
using (WordprocessingDocument doc = WordprocessingDocument.Open(memoryStreamDest, true))
{
MainDocumentPart mainPart = doc.MainDocumentPart;
AlternativeFormatImportPart altPart =
mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML, altChunkId);
altPart.FeedData(memoryStreamSrc);
var altChunk = new AltChunk();
altChunk.Id = altChunkId;
OpenXmlElement lastElem = mainPart.Document.Body.Elements<AltChunk>().LastOrDefault();
if(lastElem == null)
{
lastElem = mainPart.Document.Body.Elements<Paragraph>().Last();
}
//Page Brake einfügen
Paragraph pageBreakP = new Paragraph();
Run pageBreakR = new Run();
Break pageBreakBr = new Break() { Type = BreakValues.Page };
pageBreakP.Append(pageBreakR);
pageBreakR.Append(pageBreakBr);
return memoryStreamDest.ToArray();
}
}
}
我的解决方案:
using DocumentFormat.OpenXml;
using DocumentFormat.OpenXml.Packaging;
using DocumentFormat.OpenXml.Wordprocessing;
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Linq;
namespace TestFusionWord
{
internal class Program
{
public static void MergeDocx(List<string> ListPathFilesToMerge, string DestinationPathFile, bool OverWriteDestination, bool WithBreakPage)
{
#region Control arguments
List<string> ListError = new List<string>();
if (ListPathFilesToMerge == null || ListPathFilesToMerge.Count == 0)
{
ListError.Add("Il n'y a aucun fichier à fusionner dans la liste passée en paramètre ListPathFilesToMerge");
}
else
{
foreach (var item in ListPathFilesToMerge.Where(x => Path.GetExtension(x.ToLower()) != ".docx"))
{
ListError.Add(string.Format("Le fichier '{0}' indiqué dans la liste passée en paramètre ListPathFilesToMerge n'a pas l'extension .docx", item));
}
foreach (var item in ListPathFilesToMerge.Where(x => !File.Exists(x)))
{
ListError.Add(string.Format("Le fichier '{0}' indiqué dans la liste passée en paramètre ListPathFilesToMerge n'existe pas", item));
}
}
if (string.IsNullOrWhiteSpace(DestinationPathFile))
{
ListError.Add("Le fichier destination FinalPathFile passé en paramètre ne peut être vide");
}
else
{
if (Path.GetExtension(DestinationPathFile.ToLower()) != ".docx")
{
ListError.Add(string.Format("Le fichier destination '{0}' indiqué dans le paramètre DestinationPathFile n'a pas l'extension .docx", DestinationPathFile));
}
if (File.Exists(DestinationPathFile) && !OverWriteDestination)
{
ListError.Add(string.Format("Le fichier destination '{0}' existe déjà. Utilisez l'argument OverWriteDestination si vous souhaitez l'écraser", DestinationPathFile));
}
}
if (ListError.Any())
{
string MessageError = "Des erreurs ont été rencontrés, détail : " + Environment.NewLine + ListError.Select(x => "- " + x).Aggregate((x, y) => x + Environment.NewLine + y);
throw new ArgumentException(MessageError);
}
#endregion Control arguments
#region Merge Files
//Suppression du fichier destination (aucune erreur déclenchée si le fichier n'existe pas)
File.Delete(DestinationPathFile);
//Création du fichier destination à vide
using (WordprocessingDocument document = WordprocessingDocument.Create(DestinationPathFile, WordprocessingDocumentType.Document))
{
MainDocumentPart mainPart = document.AddMainDocumentPart();
mainPart.Document = new Document(new Body());
document.MainDocumentPart.Document.Save();
}
//Fusion des documents
using (WordprocessingDocument myDoc = WordprocessingDocument.Open(DestinationPathFile, true))
{
MainDocumentPart mainPart = myDoc.MainDocumentPart;
Body body = mainPart.Document.Body;
for (int i = 0; i < ListPathFilesToMerge.Count; i++)
{
string currentpathfile = ListPathFilesToMerge[i];
AlternativeFormatImportPart chunk = mainPart.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.WordprocessingML);
string altchunkid = mainPart.GetIdOfPart(chunk);
using (FileStream fileStream = File.Open(currentpathfile, FileMode.Open))
chunk.FeedData(fileStream);
AltChunk altChunk = new AltChunk();
altChunk.Id = altchunkid;
OpenXmlElement last = body.Elements().LastOrDefault(e => e is AltChunk || e is Paragraph);
body.InsertAfter(altChunk, last);
if (WithBreakPage && i < ListPathFilesToMerge.Count - 1) // If its not the last file, add breakpage
{
last = body.Elements().LastOrDefault(e => e is AltChunk || e is Paragraph);
last.InsertAfterSelf(new Paragraph(new Run(new Break() { Type = BreakValues.Page })));
}
}
mainPart.Document.Save();
}
#endregion Merge Files
}
private static int Main(string[] args)
{
try
{
string DestinationPathFile = @"C:'temp'testfusion'docfinal.docx";
List<string> ListPathFilesToMerge = new List<string>()
{
@"C:'temp'testfusion'fichier1.docx",
@"C:'temp'testfusion'fichier2.docx",
@"C:'temp'testfusion'fichier3.docx"
};
ListPathFilesToMerge.Sort(); //Sort for always have the same file
MergeDocx(ListPathFilesToMerge, DestinationPathFile, true, true);
#if DEBUG
Process.Start(DestinationPathFile); //open file
#endif
return 0;
}
catch (Exception Ex)
{
Console.Error.WriteLine(Ex.Message);
//Log exception here
return -1;
}
}
}
}