The rise of synthetic data with Florian Hönicke from Jina AI -

The rise of synthetic data with Florian Hönicke from Jina AI

Listen now

Description

Data is the fuel that is powering the AI revolution - but what do we do when there's just not enough data to satisfy the insatiable appetite of new model training? In this episode, Florian Hönicke, Principal AI Engineer at Jina AI, discusses the use of LLMs to generate synthetic data to help solve the data bottleneck. He also addresses the potential risks associated with an over-reliance on synthetic data. German startup Jina AI is one of the many exciting companies coming out of Europe, supporting the development and commercialisation of generative AI. The team at Jina AI gained widespread attention in late 2023 for the release of the first open-source text embedding model with an 8192 token length. Jina-embeddings-v2 achieves state-of-the-art performance on a range of embedding-related tasks and matches the performance of OpenAI's proprietary ada-002 model. Watch the video of our interview: https://youtu.be/AP80hZajk5w

More Episodes

See all »

Neuroscience and AI with Basis co-founder Emily Mackevicius

Emily Mackevicius is a co-founder and director of Basis, a nonprofit applied research organization focused on understanding and building intelligence while advancing society’s ability to solve intractable problems. Emily is a member of the Simons Society of Fellows, and a postdoc in the Aronov...

Published 04/15/24

Knowledge Distillation with Helen Byrne

Published 04/15/24

Stable Diffusion 3 with Stability AI's Kate Hodesdon

Stability AI’s Stable Diffusion model is one of the best known and most widely used text-to-image systems. The decision to open-source both the model weights and code has ensured its mass adoption, with the company claiming more than 330 million downloads. Details of the latest version - Stable...

Published 04/07/24