Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Next step is Video. Adding temporal dimension will emphasize extrapolating true 3d shapes of recognized objects.


Not just 3D shapes, but understand actions as they develop in time with recurrent neural networks.


What if you took these same Neural Networks as they exist now, and tweaked the input and the parameters slightly. For the input, use individual frames of an hour long video of a leopard (in order), and instead of having it just identify whether or not there is a leopard, have it identify what in each image is the leopard, and have it try to predict the next frame.

It seems that this is more like the way that we learn to identify things. Then once we establish an understanding of a base class (big cat) we can apply that same model to new cats that we have never seen before with just a picture.


I think a recurrent neural network is more suitable here as the size of context for each example might be different, instead of fixing the frame size.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: