I'm not sure I see why you would want to base this on Markdown. Markdown is designed for a very specific niche, and this falls far outside that niche.
It seems it would make a lot more sense to just design the language from scratch, rather than try to bend Markdown to do something it was not at all meant for.
For instance, why would you WANT to have an example like this:
I totally disagree, and I have the exact opposite reaction. Markdown is something tons of people know already. I literally just glanced over the article and felt I could generate a "narrated PowerPoint", which seems like the main purpose of this, extremely quickly. Why would I want to learn a completely new language because there are some trivially minor syntax oddities with using Markdown?
btw, for narrated powerpoint, you can actually use Video Puppet directly with Powerpoint files - just put narration into speaker notes. Here's more info on that: https://videopuppet.com/docs/powerpoint/
Because you already have to learn a bunch of new stuff, since Markdown does not support this use case.
You could easily borrow some common things from Markdown to make things easier, but this seems to try to force following the Markdown syntax as much as possible, even when that syntax makes no sense in context.
It is much better to invent new things for the cases that are completely new, than try to force a square peg into a round hole.
you can use JSON or YAML if you like more structure. Markdown has good editor support, so using it as source for videos means video sources render nicely in GitHub, for example.
Also, if you don't like ![](), you can just use stage directions with brackets. The equivalent script will be :
Why have two ways to do the same thing, where one is awkward and the other is not? Just commit to doing things the less awkward way, and throw out the idea that you need to be backwards compatible with something designed for a completely different purpose.
Readable but has some intelligence and decoration that is not distracting. See it enough and the syntax will become invisible over time like using punctuation.
It seems it would make a lot more sense to just design the language from scratch, rather than try to bend Markdown to do something it was not at all meant for.
For instance, why would you WANT to have an example like this:

Welcome to London
---

Welcome to Berlin