Batch transcription-Speech to Text in Microsoft Azure Logic Apps

Shailaja Natarajan
5 min readJul 9, 2021

Digital transformation leads all business to automate their process in efficient way using Cloud services. Companies are looking for ways to quickly adapt to changing market conditions, offering better customer engagement while reducing costs. Helping developers quickly build and deploy business critical workflows, Azure Logic Apps is an essential tool for key enterprise scenarios.

In real world most of call centre enterprises have quality controllers to analyse all customer calls to find customer satisfaction and improve the performance of call centre agents. With cutting edge technology, the batch transcription of Audio to speech can be handled efficiently by Azure Logic apps.

In this article, we will see step-by-step workflow for batch transcription implementation.

Batch transcription

Batch transcription is a set of REST API operations that enable us to transcribe a large amount of audio in storage. We can point to audio files with a shared access signature (SAS) URI and asynchronously receive transcription results.

Logic App

Azure Logic App is a cloud service that helps to automate the task based on business process when we need to integrate the apps, data, systems and services across enterprises.

Architecture diagram

Pre-requisites

Logic App Workflow

Step 1: Add a blob trigger connector to fire when a file is added to Azure Storage. Add container name which was already created in Storage account in Azure Portal. Need to configure the time interval, based on that trigger will be initiated.

Step 2: Create Audio URL variable to hold all URL path for all audio files.

Step 3: Create a variable to hold the audio files content.

Step 4: Add SAS URI by path connector to get the file path of the audio that was added to blob storage in the trigger refer: step 1

Step 5: Create a variable to hold the output from step4 . Speech API will write it’s transcription output in this path.

Step 6: Create HTTP POST request connector and configure access key and copy schema from Azure Swagger documents. Run logic app once to fetch response schema in this step.

Step 7: Parse Transcription step used to parse content from previous steps. Copy ‘Body’ of response from Step 6 as sample payload. Need to run Logic app instance till step 6 to get schema.

Step 8: Create Transcription status variable to hold the status of ‘POST HTTP’ response from step 6

Step 9: Create ‘UNTIL Control’ statement to check the Transcription status. This loop will keep on increment by 1 minute till the Status changed from ‘Not started’ to ‘Succeeded’.

Step 9.1: Create delay timer for 1 minute to check Transcription status

Step 9.2: Create ‘GET’ HTTP request to check the transcription status

Step 9.3: Parse Json file from Step 9.2

Step 9.4: Create ‘Set transcription status’ variable to loop back condition check in ‘UNTIL’

Step 10: Create ‘Get transcription files’ HTTP connector to get ‘content URL’ path. Configure ‘self’ as dynamic content referring parameter belongs to previous step 9.3

Step 11: Create ‘Parse transcription files’ HTTP connector to parse HTTP content from step 10

Step 12: Create ‘For each’ Connector to parse the transcription content from the Json response body from step 11.

Step 12.1: Create ‘Get content & Parse content’ HTTP request to finally fetch the transcription file from Azure cognitive service. Then parse the response to move the text into desired format in Azure storage account.

12.2: Create ‘parse Json & Set transcription’ action to retain the text content

12.3: ‘Create blob’ connector used to save all transcription text in another container ‘test’ in storage account.

12.4: ‘Insert Entity’ connector used to append the entities in database, here text is saved in Azure table storage.

12.5: Finally, ‘Reset Variable ‘steps used to reset all local variables created in workflow.

Logic App ‘Run instance Window’

Step 13: ‘Delete Transcription’ action used to delete all transcription id. This is the final step of transcription process.

Conclusion :

I hope this article will help you to automate the repetitive process of speech to text transcription effectively and you can utilize this flow for your business requirement. Thanks for your valuable time to read my article.

References:

How to use batch transcription — Speech service — Azure Cognitive Services | Microsoft Docs

Speech to text API V3.0 Swagger Document : Cognitive Services APIs Reference (microsoft.com)

--

--