Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am working on a curated database of proxy IP addresses frequently used by bots: https://deviceandbrowserinfo.com/product/proxies-ips

So far I have ~ 3M distinct IP addresses per 30 days, with a lot of fresh proxy IPs, 1.7M. The DB contains only verified IP addresses through which I've been able to route traffic. It DOESN'T rely on 3rd party/open-source data sources.

I also made an open-source proxy IP block list based on the data: https://github.com/antoinevastel/avastel-bot-ips-lists



Wouldn’t this end up flagging a lot of residential IPs due to residential proxies?


The DB contains different types of proxies: - Residential - ISP - Data center

I don't include mobile proxies since they're heavily shared, so knowing that an IP address was used as a proxy at some point is basically useless.

Regarding your remark, indeed, there are several shared residential IPs, including IPs of legitimate users who may have a shady app that routes traffic through their device. That's why I don't recommend blocking using IP addresses as is. It's supposed to be more of a datapoint/signal to enrich your anti-fraud/anti-bot system. However, regarding the block list, I analyze the IPs on bigger time frames, the percentage of IPs in the range that were used as proxies, and generate a confidence score to indicate whether or not it is safe to block.


Sounds like pretty sophisticated filtering!

I’m working on a scraping project at the moment so looking at this too but from the other end. Super low volume though so pretty tame - emphasis on success rate more than throughput

I bought a 4G dongle for use as last resort if nothing else gets through. And also investigating ipv6


Using a 4G dongle makes it easier to hide in the crowd indeed. Since your traffic will go through heavily shared mobile IPs, probably with thousands of users behind them, anti-bot vendors won't/shouldn't block per IP, but per fingerprint/session cookie instead.


Ah hadn’t realised it’s the NAT. I thought it’s because the IPs are dynamic and rotate too much. Interesting.

Currently planning on doing a layered approach. Cloud IPs first etc.

Interesting challenge but also trying to be somewhat respectful about it since nobody likes aggressive bots




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: