Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You can also use multiple transcription engines and then use mismatches among the text streams to narrow down the % of content that needs to be reviewed. This is quite similar to multi-voting OCR for document images.

The principle is that the engines have different failure modes (hopefully) and therefore the 2-3% error rate of each engine is in different areas of the audio. The key underlying assumption is that the events are mutually exclusive.

With 3 engines, you can use something like 2-of-3 stream matches to override the stream that mismatches.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: