Confidentiality for translators and interpreters in the age of AI
Speech recognition, artificial intelligence, and other computer-assisted tools can save us time, improve quality and boost productivity.
But... what about confidentiality?
In this article, we’ll take a closer look at how leading terminology, speech recognition and other AI tools for translators and interpreters address confidentiality and data protection.
If it’s online, it’s never 100% confidential
To be clear: No matter how much protection a tool supposedly offers, with so many moving parts, data sent over the internet is never truly safe.
But the internet has permeated our businesses.
Email. Online collaboration. Remote conferencing platforms.
Every cloud-based tool shares and stores information over the internet. What’s a language professional to do?
First, remember that online companies are incentivised to keep customer data safe. It should be in their interest to avoid leaks and protect information through good security practice such as strong encryption. Before you sign up for a service, do some due diligence on their security track record.
Second, take control of keeping your data safer by using a password manager like 1Password, turning on two-factor authentication wherever available and using a good Virtual Private Network tool like Tunnelbear when accessing the internet through unsecure public wifi in coffee shops or airports.
Third, read on for more about which tools offer the best data protection. 😉
Terminology tools
Except for extremely client-specific terms, terminology is not usually confidential. After all, it’s basically a list of words related to a specific subject. Therefore, sharing term lists with a colleague you’re working with is generally safe.
Trickier issues arise when using terminology extraction technology with private or confidential documents. Let’s see how the leading tools address this aspect of confidentiality.
InterpretBank
InterpretBank is the only widely-used tool that extracts terminology on your device without connecting to the cloud.
InterpretBank also includes translation suggestions for individual entries or for an entire glossary. This feature sends terms to a server and requests translations; neither searches nor results are logged. You can also disable the program’s web access and only use offline resources included in InterpretBank, like IATE.
If you choose to share a glossary through InterpretBank, it is uploaded to your account in the cloud, but according to the company, the “database is not processed by us or third parties, [except] for the purpose of maintaining the service in function.”
InterpretBank’s data security page lacks information about how data is processed when working with the cloud-based Automatic Speech Recognition feature. As it most likely connects to commercial speech recognition engines, it should not be used on internal or confidential documents.
Interpreters’ Help
The terminology extraction feature built into Interpreter's Help is simple, but efficient.
Copy a text and its translation into the source and target language columns. Read through the text, highlight terms and their translations, and add them to your glossary by hitting Enter.
According to the developer, texts copied into the terminology extraction interface are never uploaded to the web, so your data is fully confidential.
SketchEngine
SketchEngine’s OneClick Terms is one of the most powerful terminology extraction tools on the market. (Learn more in my blog post, How to use Sketch Engine to extract terminology from a document or parallel texts in just a few clicks.)
OneClick Terms offers high-quality terminology extraction by comparing term candidates in your document(s) to huge reference corpora. This requires the power of cloud computing, but there’s good news: According to the developers, your data is processed in a secure data center, is never shared, and is automatically deleted after three days.
The company behind SketchEngine, Lexical Computing, is also certified under ISO 27001. (Learn more about this certification here.)
Speech recognition tools
As speech recognition continues to improve, more interpreters are incorporating it into their work. (Here’s an example from insiders member Lilia Pino Blouin.)
Most speech recognition takes place online, since significant computing power is required to obtain high-quality results.
(Programs like Dragon Dictate and Cabolo do provide offline speech recognition, but Dragon only supports a small number of languages, must be trained for a single user’s voice, and only runs on Windows, while Cabolo targets corporate users and is probably too expensive for individual interpreters.)
Most speech recognition tools designed for the general public, like Otter.ai and Maestra, use commercial speech recognition engines. It’s probably fine to use them for public meetings, but I’d avoid them in confidential settings unless your client explicitly permits it.
The newest computer-assisted interpreting tool, Cymo Note, is based on speech recognition. The developers address confidentiality by not storing your data on their servers and referring you to the privacy policies for each of the available speech recognition engines.
In general, I recommend educating clients about how speech recognition can help you produce high-quality work for their benefit, and requesting their explicit written permission to use such tools.
ChatGPT and Artificial Intelligence
AI tools like ChatGPT are taking the world by storm, and language professionals are no exception.
For example, colleagues have explored using ChatGPT for terminology extraction and glossary creation. While this is fine for publicly available documents, do not use ChatGPT with any confidential data. Anything you copy into ChatGPT will be used to train the underlying model.
Luckily, some third-party apps allow you to tap into the language learning model behind ChatGPT without feeding it your data.
You can set up your own OpenAI account, then ask the company to exclude your data from its training data. This allows you to use extensions like GPT for Sheets and Docs to use GPT features like summarization or machine translation on your documents or spreadsheets, and is also the least expensive way to leverage artificial intelligence. (Some elbow grease required.)
More good news: One of my two favorite AI-powered preparation tools explicitly prohibits AI from using your information as training data.
Readwise Reader
I’m a huge fan of the AI-powered Ghostreader feature in Readwise Reader. Add a publicly-available article, PDF, or video to your Reader, then use Ghostreader to define words in context, look up people & places, simplify complex language, and summarize paragraphs or entire documents. You can also generate lists of terms or key ideas in a flash, or extract terminology from your text. (To learn how, check out my insiders webinar on AI-Powered Preparation with Readwise Reader.)
As Readwise's Privacy Policy does not address how the artificial intelligence features in the tool work, I wouldn’t upload any confidential documents to the service. However, it's a great tool for any publicly available documents or videos.
Notion AI
Released in February 2023, Notion AI is an incredibly powerful preparation tool. Create lists of terms, acronyms, and definitions in one or more languages, see how words are used in context, extract terminology, align texts or compile fact sheets. (Learn more in my insiders webinar, AI-Powered Preparation with Notion.)
The best part? Notion’s strong security policy for AI-powered features, which explicitly states: “The Notion AI Writing Suite will not use your data to train our models...We do not allow any partners or 3rd parties to use your data for training their models or any other purpose.”
Notion also has strong security practices, including encryption, quarterly independent security audits, and GDPR compliance.
As a result, Notion AI might just be the safest, easiest, and most affordable way for translators and interpreters to tap into the power of ChatGPT without their data being used to train the model.
Machine translation
Machine translation tools can significantly streamline our work, offering a first version of a text we’re translating or a speech we’ll interpret.
The same principles outlined above apply to machine translation: Do not simply hand over your valuable information to algorithm training. Instead, pick machine translation engines where your confidential information is explicitly excluded from training data.
Google Translate
Do not use Google Translate for sensitive information.
Everything you translate with Google Translate is stored and analyzed by Google. Google’s Terms of Service explicitly grant the company the right to use “automated systems and algorithms to analyze your content...to recognize patterns in data.”
DeepL
DeepL offers a free plan and several paid options. DeepL’s terms explicitly state that data processed using the Free version of the service may be stored. Opt instead for a Pro license, where data is encrypted end-to-end, text is immediately deleted after translation, and “DeepL Pro subscribers’ texts are not used to train our models.” (You’ll get plenty of other perks, too, like DeepL plug-ins for a range of CAT tools.)
Artificial intelligence and machine translation
Do not use ChatGPT for machine translation. As explained above, your information will be used to train the machine.
Instead, opt for a safer option. Set up your own OpenAI account, opt out of data collection, and machine translate your text with the GPT for Sheets and Docs plugin. Or easier still, use Notion AI. 😉
Final thoughts on data protection in the age of AI
Confidentiality is the cornerstone of translation and interpreting ethics. This should also extend to the tools we use to do our work.
Some computer-assisted interpreting tools offer much stronger data protection than others.
Your client’s data is safe with features that work offline, like the term extraction in InterpretBank and Interpreters’ Help.
Speech recognition requires considerable computing power and is nearly always web-based. Do not use it for confidential meetings unless your client explicitly approves.
If using machine translation, opt for a paid version where your data does not train the machine.
And if you’re going to use artificial intelligence, pick Notion AI, which explicitly keeps your content out of training data.
Finally, educate your clients. Explain how AI tools can help you work better, as well as how you and these companies protect their data. Get explicit written permission to use speech recognition or other artificial intelligence technology, and where possible, include this in your contracts. It’s always better to be safe than sorry!
Disclaimer: This article does not constitute legal advice. Consult a local attorney to obtain legal advice about your personal situation, and your country's legal system.
By making use of the information provided herein in any way whatsoever, you waive all claims of liability of any and every nature whatsoever against techforword Sàrl and Joshua Goldsmith.