Google Gemini now creates music from text, images, and videos

Google's Gemini AI model can now generate music based on user prompts, images, or videos. The generated tracks are embedded with an invisible watermark to identify their AI origin.

15 June 2026

Google Gemini now creates music from text, images, and videos

Google has expanded the capabilities of its Gemini AI model to include the generation of music. The AI can now create approximately 30-second music tracks based on user-provided text prompts, images, or videos.

The new feature utilizes Google DeepMind's generative music model, Lyria 3. Users can input text to specify desired genres, tempos, and vocal styles, or allow the AI to interpret prompts more broadly. Gemini can also draw inspiration from uploaded images or videos to compose fitting soundtracks. Google has stated that all AI-generated music will be watermarked using its SynthID technology, which is imperceptible to listeners but verifiable.

This music generation functionality within Gemini is currently in beta and available to adult users in English, German, Spanish, French, Hindi, Japanese, Korean, and Portuguese. Prompts can range from detailed requests like "create a fast (120 BPM) soul-funk track with a warm, female soprano voice" to more general suggestions such as "90s trance music that starts slow but feels uplifting."

Lyria 3 is also being integrated into YouTube to assist content creators. The "Dream Track" feature will allow creators to add background music to their YouTube Shorts videos. This functionality, previously an experimental feature in the US, is now becoming available in other regions. Google aims to support YouTube creators through AI tools while adhering to copyright laws.

Original source: heise.de