Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Just a heads up that there is a similar Python package for this that is free called Herbie [1], though the syntax of this one looks a little easier to use.

[1]: https://herbie.readthedocs.io/en/stable/



Interesting!

This is a little different though.

This dataset is is part of the AWS Open Data program and so it is freely available from S3. By running the API within AWS then you get a massive latency and bandwidth advantage.

So with GribStream you are pushing the computation closer to where the big data is and only downloading the small data. And GribStream uses a custom grib2 parser that allows it to extract the data in a streaming fashion, using very little memory.

It makes a huge difference if you need to extract timeseries of a handful of coordinates for months at a time.

Cheers!


> And GribStream uses a custom grib2 parser that allows it to extract the data in a streaming fashion, using very little memory.

How does this compare to using Xarray on a netCDF dataset?


Grib2 files are much more compact than netCDF, just less convenient to use. But GribStream takes care of that and just returns you the timeseries for the coordinates you need.

Besides using the usual index files to only do http range requests for weather parameters of interest, GribStream also avoids creating big memory buffers to decode/decompress the whole grid. It does the decoding in a streaming fashion and only accumulates the values that are being looked for so it can do so very efficiently. It doesn't even finish downloading the partial grib file, it early aborts. And it also skips ahead many headers and parts of the grib2 format that are not really required or that can be assumed for being constant in the whole dataset. In other words, it cuts all possible corners and the parse is (currently) specifically optimized for the NBM and GFS datasets.

Although I intend to support several others, like the Rapid Refresh (RAP) model.

And the fact that this process runs close to the data (AWS), it can do so way faster than you can run it anywhere else.


blaylockbk/Herbie: https://github.com/blaylockbk/Herbie :

> Download numerical weather prediction datasets (HRRR, RAP, GFS, IFS, etc.) from NOMADS, NODD partners (Amazon, Google, Microsoft), ECMWF open data, and the Pando Archive System

The Herbie docs mention a "GFS GraphCast" but not yet GenCast? https://herbie.readthedocs.io/en/stable/gallery/noaa_models/...

"GenCast predicts weather and the risks of extreme conditions with state-of-the-art accuracy" (2024) https://deepmind.google/discover/blog/gencast-predicts-weath...

"Probabilistic weather forecasting with machine learning" (2024) ; GenCast paper https://www.nature.com/articles/s41586-024-08252-9

google-deepmind/graphcast: https://github.com/google-deepmind/graphcast

Are there error and cost benchmarks for these predictive models?


Or much simpler and with many more models https://open-meteo.com/


Since TODAY GribStream.com now also leverages the NOAA Rapid Refresh (RAP) model https://rapidrefresh.noaa.gov/. It enables SkewT LogP charting for which I added a python example on the github repo. I hope you will find it useful.

You can check the example here: https://github.com/GribStream/python-client


Definitely more models right now. GribStream.com will be supporting many other models soon.

But open-meteo free access is only for non-commercial use. GribStream.com allows any use.

Also, can open-meteo query forecasts 10 days out at hourly resolution for 150.000 coordinates in a single request and take just 8 seconds? At what price?

I'll do a benchmark soon.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: