Cymo Note: Speech recognition meets automated note-taking

Is note-taking the bane of every interpreter’s existence? For many of us, it sure is. What if there were a way to “automate” the whole thing?

Enter Cymo Note, a tool that combines a running transcription (highlighting key terms and figures) with a virtual notepad.

Check out this very short demo of Cymo Note.

How speech recognition works

Speech recognition takes the sound waves that make up speech and turns them into ones and zeros that a computer can work with. This process takes (hopefully high-quality) audio, filters out background noise, evens out the volume and breaks everything down into digital snapshots of roughly 25 milliseconds each. (Think of it like a film which is composed of individual still images.)

Using complicated statistical models, the computer puts these “snapshots” back together into phonemes, the basic acoustic building blocks of language. (The word ‘phoneme’, for example, includes the phonemes /f/, /n/, /m/, and /s/.)

Since different languages include different sounds, the number of phonemes varies, with English clocking in at 40!

Using advanced math and clever programming, the computer assembles phonemes into words and sentences.

Cymo Note does speech recognition in real time. Unlike in post-processed speech recognition, which takes a bit longer to check for potential errors and guess the most likely punctuation, real-time speech recognition has to deliver quickly. The output tends to be slightly less accurate, and punctuation is often missing. And you’re likely to see “flickering,” where the tool updates a previously displayed transcription.

How we use speech recognition every day

We’ve probably all dictated to our phone or computer, asked voice assistants to play music or give us directions or have turned on automatic captions during online meetings? Speech recognition is behind all of that.

For years, Dragon has been the gold standard for dictation software in business, legal and medical fields. For interpreting, however, Dragon is less than ideal since it only supports a small number of languages, must be trained for a single user’s voice, and only runs on Windows.

Recently, web-based speech recognition has made huge strides. Otter.ai is well-known for both real-time and post-processed transcription (albeit only in English, while Zoom offers automatic transcription of calls and meetings in a dozen languages. In terms of language coverage, Maestra’s Web Captioner leads the pack with over 40 languages – for free.

Language professionals can leverage speech recognition tools in many ways. Prefer to dictate translations instead of typing them? Take Dragon for a spin (if your languages are covered). Want to automatically generate subtitles, then post-edit them? Sonix can help you in over 35 languages and understands dozens of file formats. Have you ever wanted to edit video, but felt overwhelmed by complicated and expensive software? Then look no further than Descript, which makes editing video and audio as easy as working with text. It has completely revolutionized the way we edit and caption techforword videos, and supports 20+ languages. Want to dictate high-quality subtitles in real-time using the respeaking technique? Dragon leads the pack.

(For a hands-on exploration of some of these tools, check out our insiders webinars on Semi-automatic subtitling with Sonix and Editing video like text with Descript.)

Speech recognition for interpreters

Many interpreters are adopting tools like Web Captioner to generate real-time transcriptions during simultaneous interpreting assignments.

While some colleagues find a running transcription to be helpful, others are distracted by it. The jury is still out on what the ideal CAI (computer-aided interpreting) tool should look like. For example, the Ergonomics of the Artificial Boothmate project surveyed over 500 interpreters and found a range of preferences.

For years, InterpretBank has offered an experimental speech-recognition tool that extracted key information from a running transcription, displaying only names, numbers, and key terms from your glossary, along with their translations. (Check out this video demo of InterpretBank’s speech recognition.)

Some interpreters have also explored a hybrid interpreting model called SightConsec, where you work primarily from a machine-generated transcription in a consecutive setting, and may not take notes at all. (Curious about this technique? Check out this interview with Lilia Pino-Blouin to learn more.)

Cymo Note takes this one step further, building automated note-taking into a computer-assisted interpreting tool.

How Cymo Note works

Released in late 2022, Cymo Note aims to bring automatic speech recognition to the full range of interpreting settings – including remote, onsite and hybrid meetings – and modalities – simultaneous, consecutive, and hybrid approaches. 

Billing itself as a “professional multilingual note-taking software for interpreters,” Cymo Note highlights numbers, names, and terminology in a running transcription.

The tool, which has apps for Windows, Mac, and iPad, is the first interpreting software to adopt a pay-per-minute pricing system: You purchase credits and pay different amounts based on which flavor of speech recognition you use. Available engines include Microsoft ASR (about $10/hr), Tencent Speech-to-text ($4.50/hr), iFlytek Speech-to-text ($6/hr), and the proprietary Cymo Speech Engine ($3.50/hr). Alternatively, a flat-rate subscription will give you unlimited access for $58/month. This pricing model reflects the fact that running speech recognition engines is not cheap; charging users for that processing power makes the service sustainable.

Building your glossary in Cymo Note

Creating a glossary in Cymo Note is incredibly easy.

First, choose your preferred speech recognition engine and language combination from the 15+ supported languages (including Arabic, Cantonese, Dutch, English, French, German, Italian, Japanese, Korean, Mandarin, Polish, Portuguese, Russian, Spanish, and Swedish). 

Now, select your microphone or audio input. You can use the computer microphone, connect your laptop to a hardware console, or route audio from your remote interpreting platform into Cymo (Windows or Mac).

Lastly, turn on the transcription, and you’re up and running. You can also switch the transcription language with a single click without having to interrupt the transcription.

See a term you’d like to add to your glossary? Simply highlight it in the transcription. A machine translation into your second language automatically pops up. Star it to add it to your glossary, or click to edit the translation, then hit Enter to add it to your glossary.

You can also type a term into the search box at the top of the screen, then follow the same process mentioned above. Cymo Note even lets you copy and paste an entire glossary. To preserve confidentiality, only a single glossary is ever stored on your device, but you can always export a glossary, then import it back into the app, e.g. when working for repeat clients.

Cymo Note’s also includes a unique “force replacement” feature. Whenever a term has been transcribed incorrectly, select it and click the force transcribe button (an icon including the letters A and B), and type in the correct transcription. From then on, the correct term will be highlighted in bold every time it appears in your text – perfect for adding in names of speakers, companies, or products and quickly drawing your eye to them while you’re interpreting.

Cymo Note lends itself well to a novel preparation technique. Turn on the speech recognition engine, and read out a speech on a topic that’s related to your meeting. After dictating for a few minutes, read through the transcript, highlight potential terms, quickly add equivalents, and use force replace to correct proper names so they will display correctly during your assignment.

Using Cymo Note for consecutive interpreting

Cymo Note’s most innovative aspect is its consecutive interpreting feature. 

Turn on “Consecutive layout,” and the screen will automatically be divided in two, with the transcription on the left and a blank space for taking notes on the right.

Enable “Drawing mode,” and start taking notes. (You’ll get the best results when you run Cymo Note on your tablet or touchscreen device and use a stylus.) You can also adjust the brush size and color, or change the font size (up to 40 pt) with a simple slider.

Turn on “Consec bookmarks” to quickly add a bookmark at the beginning of a speech (or any place you’d like to jump to), then click the icon (of the line and arrow in the bottom-right corner) to jump straight there.

Screenshot of Cymo Note in consecutive layout with drawing mode on

With “Drawing mode”, you can take notes anywhere on the page – including on the transcribed text. The possibilities are endless: Draw a line to link notes to a specific term in the text, underline a key concept, or strike through repetitive sections you want to skip during your consecutive rendition. Feeling brave? Why not try to forgo notes entirely and spend your time reading the live transcription, jotting down potential translations, or making minor handwritten corrections to the text for a computer-assisted sight translation?

Check out this video to see how to set up Cymo Note’s consecutive interpreting features.

Before you dive in

First, Cymo Note requires practice: Make sure to get familiar with the technology before an actual assignment. 

As with all web-based applications, confidentiality is also an issue. The developers address this by not storing your data on their servers, using speech recognition engines with strong encryption, and suggesting that your client sign a non-disclosure agreement permitting you to run the meeting through web-based speech recognition technology. In any case, always exercise caution and speak to your client before using web-based speech recognition tools when confidential information is discussed or likely to come up.

For those unsure about how to properly route audio from an RSI platform into Cymo Note, the team behind the app have provided extensive documentation and video tutorials explaining how to go about it.

To my knowledge, Cymo Note cannot differentiate between a word’s root form and variants (i.e. singular and plural, cases, etc.); it will only find a term if the exact equivalent is in your glossary. 

Finally, although the pricing model makes sense, some colleagues might find the cost steep. Luckily, Cymo Note offers 20 trial credits, which gets you up to 60 minutes of testing time (depending on the speech recognition engine you use).

Is Cymo Note right for you?

Cymo Note offers a truly novel approach to speech recognition for interpreters, with running live transcription and extensive note-taking possibilities.

If you find a running transcription during simultaneous interpreting useful, Cymo Note provides a turbocharged version that highlights key names, figures and terms.

Where Cymo Note truly shines, however, is in consecutive settings. It offers unique features that are not currently available in any other tool, including the ability to:

  • annotate a live transcription

  • complement the transcript with full or partial notes

  • add bookmarks to easily navigate notes

  • highlight numbers

  • highlight (and correct) proper names, and 

  • display terms and their equivalents from your glossary in real-time. 

It also streamlines glossary creation through a unique approach combining speech recognition and machine translation, with a human in the loop to review and approve terms.

I find Cymo Note a welcome and unique addition to the computer-assisted interpreting tools currently on the market.

If you’re looking for a speech recognition tool designed for interpreters, covering 15+ languages, and offering turbocharged live transcription – plus unparalleled features for consecutive – look no further than Cymo Note!

Want to learn how to get started with Cymo Note? A full walkthrough of how to use the app is included in the insiders library and in our Speech Recognition for Interpreters course.

This article is based on our techforword insiders webinar Automated note-taking with Cymo Note. For a deep dive and hands-on practice to test out these techniques, check out the replay. Not an insiders member yet?

Join now to access 50+ hours of training to boost your translation or interpreting in the premier community for innovative language professionals!

Previous
Previous

Confidentiality for translators and interpreters in the age of AI

Next
Next

How to use Sketch Engine to extract terminology from a document or parallel texts in just a few clicks