Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: I built a tool for mobile and computer use using local and remote LLMs (github.com/bandarlabs)
2 points by mkagenius on Dec 18, 2024 | hide | past | favorite | 3 comments
Created a tool that lets you use LLMs to automate task across mobile (android) and computer. Currently, this uses screenshots and LLMs support for extracting screen UI elements effectively. This is still a work in progress and attempting to make this work with local models via Ollama (the code is in place with some issues). As of now, Gemini and GPT 4o works the best for finding UI elements and planning the task.

Some examples that work as of now:

  1. Use gmail and ask <friend>@example.com for lunch next saturday  

 2.  Start a 3+2 chess game on lichess  
Working demos: https://github.com/BandarLabs/clickclickclick

This improves the cost of one automation task from approx. $0.6 via Claude to:

$0.06 - OpenAI 4o mini as planner + free Gemini flash 1.5 (15 calls/min)

The Llama vision models will eventually make it 0.



I have been involved with something similar in the past. But I didn't enjoy it after some time. It was end to end testing with AI agent.

I am curious to know your motivation behind making this?

You wanna integrate this into a product? or was it a try to optimize on Claude's cost?


One close to my heart use case is to enable older people use phones easily.

Most of the app UIs built today is so difficult to use, people unfamiliar with technology find it even more challenging.

This tool if enabled with voice and freed from adb shackles can make it so convenient for the older folks or even the blind folks.

Part of my tool(finder) can even be used to create a walkthrough overlay on the screens.

Getting local models to eventually drive all this is the goal. Cost reduction due to that is a nice by-product.

More use cases I can think of will be in the security space, automating logins to test apps, for example. Using app+mitmproxy+llm to do some automations around security.


I like the walkthrough use case.

For older people, they should find more use with Google assistant / siri doing simple tasks for them like calling, playing music, playing videos and so on.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: