Hacker Newsnew | past | comments | ask | show | jobs | submit | kiproping's commentslogin

I wanted to do something similar to this, then I started doing some research on birds in general, and those in my locality, then I started learning about Audio and spectograms and Nyquist Theorem and many other interesting audio stuff.

Then I started going through the Intro to Conservation Bioacoustics by Cornell course, and started watching Bioacoustic Talks by the K. Lisa Yang Center cornell center.

And now I am almost at the point where I cant start manually tagging audio sets, for target species so that I can train custom classifiers to identify birds in Rwanda which are poorly detected by birdnet.

TLDR: Being jobless can lead you into interesting ventures.

* Nyquist Theorem. https://www.youtube.com/watch?v=IZJQXlbm2dU

* Intro to Conservation Bioacoustics https://www.birds.cornell.edu/ccb/pam-materials

* Bioacoustic Talks https://www.youtube.com/@CornellSounds


Thanks for sharing these resources and your story! I followed a very similar path, and ended up doing a biodiversity related MSc, with my dissertation being a custom classifier for poorly detected species in Príncipe. BirdNET and Perch are phenomenal achievements, but struggle in regions where, ironically, most of the world’s biodiversity is. What you’re doing for Rwandan species is so important!!

thanks, how is your work going. Any interesting papers you did, what is an area you think needs more research.

Do you think the same could be used for other cases? I'm thinking about detecting problems with cars (vehicles) just by the noise they make

I know google has general sound classifiers like Yamnet, trained on youtube data but they are not very good for specific usecases. So you would have to create a custom model for you usecase.

- https://www.tensorflow.org/hub/tutorials/yamnet


How far has your thinking taken you? This has piqued my interest on how many fields audio can be used to solve problems

just thinking, I've talked with a mechanic but he told me that now when they connect the car to a computer they almost always find anything wrong with a car, that and the experience they have they almost always know what's wrong.

I think sound + location could be really interesting, because you can filter parts of the car that could be making noises that are similar knowing where the mic is.


Birdnet-go is really good and actively maintained. Shout out to tphakala.

This would be a better page to link to https://github.com/esengine/DeepSeek-Reasonix/blob/main/docs...

They explain some of the the reasons why they have a better solution and why they are very opinionated

>Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — cache hit rate in practice: <20%.

So they optimize on this plus other techniques to improve cache hits, making it cheaper.


The last time I heard about something like this, it was Claude Code intentionally injecting random strings to break caching when you're not using a Claude model. Aside from that kind of intentional sabotage, I don't think any coding agent would just ignore prefix caching.

I haven't heard about this, could you please share more info, some reference on that Claude Code intentional bug?

I'm not sure what the mechanism is, but I've definitely had Claude refuse to work on sessions that were touched by other models. Some kind of integrity check failure. Resetting the session back to the point before I used the other model fixed the problem.

IIRC Anthropic's API produces cryptographic signatures for thinking blocks. If you try to submit a set of messages that include thinking blocks with missing/invalid signatures, it'll refuse.

They do this to mitigate jailbreak attempts that rely on fabricated message history (e.g. making it look like the model was compliant in previous messages, increasing the likelihood that it'll continue to be compliant in future messages).



>Most agent loops reorder, rewrite, or inject fresh timestamps each turn

That's really surprising, since it'd defeat the whole point of KV caching. I mean I buy it considering how sloppily coded the harnesses seem to be, but this like obvious low hanging fruit.

I've also often wondered why LLMs aren't trained with a format of having a dedicated contextual system-instruction role at the _end_, which you could use to put context like current time or other misc stuff.


I don't think it's factually correct.

There are context pruning strategies that will prune old messages that are no longer relevant, and context compaction from summaries, etc. But to say "most" do this on "every turn" is overstating things. I think it's more correct to say that "many" do this "occasionally."

I'm also not sure what they mean about injecting fresh timestamps. I could see why you'd prepend/append a timestamp to the user's messages to make the model aware of the current time, and the passage of time, but I can't think of any good reason to edit timestamps in prior messages. I'm sure someone can come up with one, but I'd be very surprised if this was a thing that most agent loops do, let along doing it on every turn.


i put together this, for myself so i can try to track what coding agents are doing, I add agents to it or topics (like caching, or sandboxing, file editing methods, etc) just to try and find anything novel or good, since I am/was considering making a new harness but using all the best things from any of those. I still cannot find my perfect coding agent, every one of them has some problem or just not totally what it could be.

What I do is just point agents to a folder, have it loop around a few times on a repo, fact checks at the end, but people sometimes think the software/harness around the AI model doesn't do much which is TOTALLY wrong, its probably AS important or more.. file editing methods available matter a lot, context compaction methods... matter, caching matters. I am still fantasizing about a "best of N" coding agent, that tries to take all the best stuff from all of them.

I have an idea of a coding agent that puts a lot more effort into using more than one model at the same time. Sooo much can be done with that idea.. and no one is apparently doing it yet that I can find. I just am not sure I want to put that much time into a new coding agent project. I wonder how autonomous it could be - have weekly or daily scans of the current coding agent landscape and automatic scanning of coding agent/ai code related subreddits/hacker news, analyze it to figure out what the current problems are, complaints about existing coding agents, desires --> prioritized list of possible features/fixes ---> ai agents code and make releases

https://agents.buttonscli.com


Its not surprising, that doc is full of AI slop.

> Most agent loops reorder, rewrite, or inject fresh timestamps each turn

I haven't seen that, it'd be crazy slow if they did this. What "agent loops" are they talking about here specifically? The vagueness makes it sound potentially made up.


I've never seen an agent loop "reorder, rewrite, or inject fresh timestamps" each turn other than mostly towards the end of the messages. Messing with a large part of the context every turn would be a fairly crazy thing to do.

Yeah. Those claims are just some random AI slop from claude..

It's a really lazy one too - there are so many open source harnesses, including e.g. Codex and Kimi-CLI, and of course the leaked Claude Code source, so it's trivial to verify if someone even just bothered to ask an agent to check actual source code examples.

I couldn't find the link that they mentioned too. Maybe they forgot to actually put it?


I am working on a research institute for East Africa, https://maiyoinstitute.org/. I want to tackle the dire lack of environmental data, by using 1. low cost hardware 2. Artificial Intelligence 3. Long term horizon. The problem set is huge, but I am focusing on low cost sensors for Air and Water data collection plus bioacoustics for now.


I am using flash, and it's so good. 150M tokens at $2.


They mentioned that people like you would show up. "Push back on astroturfers. The "well, actually..." crowd is out in force. Don't let them set the narrative."


"Please don't post insinuations about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data."

https://news.ycombinator.com/newsguidelines.html


Yeah, saw that; rubbed me wrong. "If you disagree you are manufactured, a shill." This kind of condescension has never been very convincing. And I mostly agree with the petition.


What do you currently use for json and batch, I was doing some analysis and my results show that gpt-oss-120b (non batch via openrotuer) is the best for now for my use case, better than gemini-flash models (batch on google). How is your experience?


Everything I do is json and of course you want that json in a specific format so that you can process it further.


Excellent resource. Small bug to report, the table here is broken (BANTU NEGROIDS section) https://britannica11.org/article/01-0358-africa/africa#secti.... Its quite fascinating to read what they thought about Africans as an African.


Thanks, nice catch. The tables can be tricky and I appreciate the heads-up on this markup leak. It will be corrected shortly.


From my casual glance, I can see only few images of particular spots and no timeline so that you can go back in history. Seems pretty rudimentary, like the 15 images you get from EOSDA LandViewer that you can only download a very low resolution thumbnail. Did you find the data helpful?


> Did you find the data helpful?

No.

Its frankly hilarious they think they can seriously put the words "SAR imagery from the world's largest SAR satellite constellation" on their homepage.

If money were being charged for it, some might call it "false advertising".

It looks to me more like a VERY limited subset of images from the satellite constellation in question.

Either that, or the constellation in question is minuscule.

Either way, something doesn't add up.


I’m all for stomping out bait and switch, but "SAR imagery from the world's largest SAR satellite constellation" does not imply that you will get all the imagery they have. Same as if i describe a liquid as “water from the Atlantic” it need not be a particularly impressive amount of water.

> Either way, something doesn't add up.

They are in the business of selling a particular type of data. They are not incentivised to give away their product for free. What you see here is the “first hit is free” kind of sample.


> They are not incentivised to give away their product for free. What you see here is the “first hit is free” kind of sample.

This is exactly what "bait and switch" means.

May I remind you that their website states "No registration. No paywall. Download and start working."


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: