Need to merge PDF files stored in SharePoint? Look no further!
In this article, I will show you how to create an Azure Function to merge PDF files stored in SharePoint. The Function will be a generic service, which receives a list of file paths to merge. This means that you can trigger a request from SPFx, Power Automate, Logic Apps… Or anything else really. we are going to use the PFDsharp library, so our code will be super simple!
Update 01-02-2022
As I was getting some requests to publish the full solution, I asked my client and I am happy to announce that they have agreed to let me publish the source code!
Any feedback is welcome, and please let me know if you find issues.
The inconvenience of Power Automate
If you are familiar with Power Automate, you may already know that you can use third-party actions to merge PDF files. But they may impose some significant disadvantages that can prevent you from using them:
- License costs – third-party providers will typically charge you per user or for the number of executions
- Data transfer – when using a remote service to merge your files, you are sending your data to that service. While this may not be a problem for files without commercial value, the same is not true for confidential information. The service provider may have strict security arrangements in place, but sometimes, the risk may be just too high.
Function advantages
Using an Azure Function, ultimately means that you are in full control. When compared with the inconveniences of third-party solutions in Power Automate:
- Cost – the merging process is super fast, which means that you can use a consumption plan for the function if you really want to. Yes, it’s almost FREE!
- Data transfer – The information will flow between your Office 365 tenant and your Azure subscription. And all can be done using memory streams, so no temporary files stored that need to be deleted at the end.
Merge PDF files
NuGet packages
- PDFsharp
- SharePointPnPCoreOnline
Code
The code to merge files is actually very simple. First, we create a class that will represent a request to the function. In my case, I used an Azure storage queue as the entry point for my function and the messages of the queue had to respect this interface.
internal class QueueItem
{
public string SiteUrl { get; set; }
public string FolderPath { get; set; }
public string FileName { get; set; }
public string[] FilesPathArray { get; set; }
}
And I have created a method that does all the work:
internal static async void MergePDFs(ClientContext ctx, QueueItem queueItem, TraceWriter log)
{
log.Info($"Creating blank PDF file...");
// instantiate new file
using (PdfDocument targetDoc = new PdfDocument())
{
Microsoft.SharePoint.Client.File file = null;
ClientResult<Stream> fileStream = null;
// parse all files in array
log.Info($"Parsing {queueItem.FilesPathArray.Length} PDF files");
foreach (string filePath in queueItem.FilesPathArray)
{
log.Info($"Parsing PDF file: {filePath}");
// get file from SharePoint
file = ctx.Web.GetFileByUrl(filePath);
fileStream = file.OpenBinaryStream();
ctx.Load(file);
await ctx.ExecuteQueryRetryAsync();
// open file and get pages
using (PdfDocument pdfDoc = PdfReader.Open(fileStream.Value, PdfDocumentOpenMode.Import))
{
for (int i = 0; i < pdfDoc.PageCount; i++)
{
targetDoc.AddPage(pdfDoc.Pages[i]);
}
}
}
log.Info($"PDF files parsed successfully");
// create result file
using (Stream newFileStream = new MemoryStream())
{
targetDoc.Save(newFileStream);
// upload to SharePoint
var destinationFolder = ctx.Web.GetFolderByServerRelativeUrl(queueItem.FolderPath);
ctx.Load(destinationFolder);
await ctx.ExecuteQueryRetryAsync();
destinationFolder.UploadFile(queueItem.FileName, newFileStream, true);
await ctx.ExecuteQueryRetryAsync();
log.Info($"Final PDF file added to SharePoint: {queueItem.FolderPath}/{queueItem.FileName}");
}
}
}
Now, in your main Function file, simply:
– deserialize the queue message,
– instantiate the SharePoint context (PnP Core package make authentication really simple)
– call the MergePDFs function.
QueueItem queueItem = JsonConvert.DeserializeObject<QueueItem>(myQueueItem);
using (ClientContext ctx = new AuthenticationManager().GetAppOnlyAuthenticatedContext(siteUrl, authId, authSecret))
{
MergePDFs(ctx, queueItem, log);
}
It’s this simple! Now to use the service, just send it an object that matches the following format
{
"SiteUrl": "https://contoso.sharepoint.com/sites/testsite",
"FolderPath": "https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test",
"FileName": "MergeResult.pdf",
"FilesPathArray": [
"https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file1.pdf",
"https://contoso.sharepoint.com/sites/testsite/Shared%20Documents/Test/file2.pdf"
]
}
Do you have a downloadable sample project for this. I am trying to create an azure function to test this and the new AuthenticationManager().GetAppOnlyAuthenticatedContext(siteUrl, authId, authSecret) keeps failing for me.
Unfortunately not
I have the same problem.. A sample project would be amazing since this is the only tutorial which I can find
I may give it a try if I find some spare time. Thanks for the feedback 🙂
Hi, this solution is very nice, but we cannot solve this error: 2020-11-19T15:13:59.432 [Error] Executed ‘Function1’ (Failed, Id=c7df2c89-44a2-43f0-a677-0f989f57ab20, Duration=766ms)Could not load type ‘System.Web.Configuration.WebConfigurationManager’ from assembly ‘System.Web, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a’. Any help? Thanks.
Hello, not sure if related, but are you using v1 runtime of azure functions?
Hello, thank for your reply, no the Runtime version is 3.0.1
Hi, that is probably the issue then as it should be v1
Hi, I deployed everything without any issue during deployment.
But when I try to test the function using the sample your provided.
I get the following error message:
[Error] Executed ‘MergePDF’ (Failed, Id=XXXXXXXXXXXXXXXXX, Duration=3ms)The input is not a valid Base-64 string as it contains a non-base 64 character, more than two padding characters, or an illegal character among the padding characters.
Sample used for testing.
{
“SiteUrl”: “https://contoso.sharepoint.com/sites/contoso”,
“FolderPath”: “https://contoso.sharepoint.com/sites/contoso/Shared%20Documents/Test”,
“FileName”: “result.pdf”,
“FilesPathArray”: [
“https://contoso.sharepoint.com/sites/contoso/Shared%20Documents/Test/file1.pdf”,
“https://contoso.sharepoint.com/sites/contoso/Shared%20Documents/Test/file2.pdf”
]
}
Were you able to resolve this? All I can think is that it could be related to the details used for authentication
Hey Joel, what app service plan do you run to host these functions? We had a number of issues with sandbox security & PDFs in the past on consumption tier plans.
Hi, sorry for the late reply. These were running on consumption plans, what issues did you experience?
I see listed under Function Advantages that you listed the code as nearly free.
I assume that’s because of the subscriptions through Microsoft to have the infrastructure to setup this Azure Function?
Like, we don’t need to pay anything additional to run this Solution?
Hi Tim, sorry for the late reply. Your assumption was correct and I was referring to the very small cost of running Azure Functions as this even works with consumption plan.