Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

>"If you were given remote control of two robot arms, and just one camera to look through, how many different tasks do you think you could complete successfully?"

There are an infinite number of scenes that can be matched to one 2d picture. And what is a scene really? The last time I checked, RGB was not a good way of input in Computer Vision and rather relied on increasing levels of gradients via CNNs to build a compositional scene. None of that is paticularly translatable to how a LM works with text.



Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: