the performance of jax + GPU is abysmal compared with an equivalent pytorch implementation. This is sad but unsurprising. jax might be relegated to TPU only (for me) for a little while longer. that said, there is nothing that touches jax + TPU for large scale perf (405B bby)