Good Data Matters

On November 14th, two days before we launched the AI Data CO-OP initiative, Google DeepMind released GraphCast, their graph neural network (GNN) model for weather forecasting.  It is yet another elegant application of deep learning by DeepMind and important to pay attention to here as it is a great example of the impact of good data on AI training.  DeepMind’s creative work with the GNN is significant and you can see the models and even run them on the DeepMind web site, but let’s talk about the data.

Google leveraged 39 years of archived data from the European Center for Medium-Range Weather Forecasts (ECMWF), from 1979-2017 to train the GNN.  This training took 4 weeks running on 32 cloud TPU v4’s in parallel.  Note that you read in the press that you can run the model in a few minutes on a laptop, that’s the pretrained model and you are feeding that model updated information for two additional weather results.  The trained model was tested on the two additional weather inputs against test data then from 2018 through present day and we have accuracy results that beat current weather prediction methods.

The ECMWF is an independent organization that works with 35 European countries and employs 450 staff to collect and administer weather and provide meteorological services to these members.  The organization was established in 1975 and began collecting and curating data and sharing it with their partners.  When they started, I am sure they had no expectations of what Google and DeepMind would be doing with their data in 2023, but we are blessed by their discipline and their diligence.  What are the other areas we are missing for the future?

dpd

Previous
Previous

Gen AI Copyright Act

Next
Next

The AI Data CO-OP