GROK IMAGINE PROMPTS

Start wild, stack vivid details, end with "hyper-detailed cinematic lighting, surreal atmosphere". Watch ordinary words explode into impossible worlds.

Grok is xAI's maximally truth-seeking AI, now supercharged under SpaceX, and Imagine is its built-in tool for turning text into stunning creative visuals.

The video shown is made with Grok Imagine, using my Prompt Generator below, with all its prompt values available as defaults in this prompt generator, easy to explore and reuse.

This generator helps you specify details for lighting, camera, scene, style, colors, mood, weather, season, the subjects in the scene, and much more.

 

Make dreams come true, in minutes.

INTRODUCTION

Unlock stunning, high-quality videos in minutes with this Grok Imagine prompt generator.

By crafting ultra-specific video features, you gain full creative control. This tool helps you become a true master of Grok video creation. The video above was created with this tool — and every default value you see comes straight from this video, ready for you to explore, tweak and reuse.

STEPS

Here is how you could build your first prompt.

  • Come up with your own idea and edit the sections or use the default values.
  • Click Generate to update the prompt data.
  • Click Copy (to clipboard) or Download (save it locally).
  • Before you parse the prompt to Grok Imagine, you can ask Grok to add special features to your prompt.
  • Go to Grock Imagine, paste the final JSON Prompt.
  • Wait a few minutes and Viola!

FEATURES

You have the following features, with keyboard shortcuts.

  • [Alt-P] Previous Section.
  • [Alt-N] Next Section.
  • [Alt-G] Generate Prompt.
  • [Alt-C] Copy Prompt.
  • [Alt-D] Download Prompt.
  • [Alt-I] Import Prompt.

TIPS

If it's your first prompt, here a some tips.

  • Explore the default values to better understand what you can do.
  • Use the Download feature to save you changes locally.
  • Use the Import feature to load and edit your prompts.
  • When you have a valid prompt, use the Copy feature, and go to Grok Imagine and paste in your prompt, after you have selected the video settings.
  • In a fews minutes your video it done and you can see the results.
  • Enjoy!

OUTPUT

The output of this generator is a JSON formatted text, which defines a structured representation of your prompt.

// A JSON object with a set of properties expressed as name:value pairs.
{
  "name1": "value1",
  "name2": "value2"
}


// A JSON list (array) of values.
[
  "value1", "value2"
]


// A JSON list (array) of objects.
[
  {
    "name1": "value1",
    "name2": "value2"
  },
  {
    "name1": "value1",
    "name2": "value2"
  }
]
Your can see the full JSON prompt at the bottom of this page. Click the generate icon to update the JSON prompt after your changes.

USING IMAGES

In Grok Imagine you can reference attached images, in your prompt, via the @ImageId tag.

Use the @ImageId tags in the generator section fields to point out attached images to use as a baseline to a scene, character, subject etc. Example:

{
  "scene": {
    "environment": "exterior looking like @Image1",
  }
}
GENERATE COPY DOWNLOAD IMPORT

Meta

This section contains metadata about the prompt itself, including version, creator, and generation settings. It's useful for versioning, tracking, and configuring the underlying image generation model.

Indicates the version of this prompt structure for compatibility or updates.
  • Required: Yes, for tracking changes.
  • Values: Any version string (e.g., "1.0", "2.0", "beta").
  • Example: "2.0".
Identifies who created the prompt (e.g., for attribution).
  • Required: Optional, but good for documentation.
  • Values: Any name or identifier (e.g., "Grok AI", "User XYZ").
  • Example: "Grok AI".
Describes the high-level goal of the image (e.g., what type of generation).
  • Required: Yes, to guide the tool's focus.
  • Values: Common intents like "image_generation", "concept_art", "portrait", "landscape".
  • Example: "image_generation".
Provides additional context or instructions about the prompt's design.
  • Required: Optional, for elaboration.
  • Values: Any descriptive text.
  • Example: "Ultra-detailed structured JSON prompt for Grok Imagine to generate a highly intricate cyberpunk cityscape scene with maximum control over elements, styles, and technical parameters. Designed for photorealistic output with immersive depth and atmospheric effects."
Specifies the AI model or integration to use for generation.
  • Required: Yes, to ensure compatibility.
  • Values: Model names like "Grok Imagine with Flux.1 integration", "Stable Diffusion", "DALL-E".
  • Example: "Grok Imagine with Flux.1 integration".

Generation Parameters

Technical settings for the image generation process (e.g., diffusion model hyperparameters).

A random seed for reproducibility, same seed + prompt = same image.
  • Required: Optional, defaults to random.
  • Values: Any non-negative integer (e.g., 0 to 2^32-1).
  • Example: 42.
Number of diffusion steps, higher values improve quality but take longer.
  • Required: Optional, defaults vary by model.
  • Values: Typically 20-100; higher for more detail.
  • Example: 50.
Classifier-Free Guidance scale; controls how closely the image follows the prompt (higher = stricter adherence).
  • Required: Optional, defaults around 7-8.
  • Values: 1.0 to 20.0; common range 5-10.
  • Example: 7.5.
Strength of denoising (for img2img modes); 0 = no change, 1 = full regeneration.
  • Required: Optional, used in variations.
  • Values: 0.0 to 1.0.
  • Example: 0.75.
The sampling algorithm for generation.
  • Required: Optional, defaults to model's standard.
  • Values: Model-specific, e.g., "Euler a", "DDIM", "LMS".
  • Example: "Euler a".

Image

Defines the intended use or category of the generated image.

Defines the intended use or category of the generated image.
  • Required: Yes, to set context.
  • Values: Categories like "concept_art", "illustration", "photo", "logo".
  • Example: "concept_art".

Scene

This describes the overall environment and setting of the image.

Detailed description of the place.
  • Required: Yes.
  • Values: Any descriptive text.
  • Example: "futuristic megacity at night, sprawling urban landscape with towering skyscrapers, neon-lit streets, flying vehicles, and rainy atmosphere".
Type of setting (indoor/outdoor).
  • Required: Yes.
  • Values: "exterior", "interior", "abstract".
  • Example: "exterior".
Sets the temporal aspect for lighting/mood.
  • Required: Optional.
  • Values: "day", "night", "dawn", "midnight".
  • Example: "midnight".
Atmospheric conditions affecting visuals.
  • Required: Optional.
  • Values: Descriptive like "sunny", "rainy", "foggy".
  • Example: "heavy rain with puddles reflecting neon lights, misty fog rolling through alleys".
Seasonal influences on the scene.
  • Required: Optional.
  • Values: "summer", "winter", "autumn", "eternal urban winter".
  • Example: "eternal urban winter".

Style

Controls the artistic look and quality.

Overall aesthetic.
  • Required: Yes.
  • Values: Genres like "photorealistic cyberpunk", "cartoon", "oil painting".
  • Example: "photorealistic cyberpunk, inspired by Blade Runner and Ghost in the Shell".
Technical quality.
  • Required: Optional.
  • Values: "high-fidelity 8K resolution, ultra-detailed textures".
  • Example: As above.
Inspirations from artists/films.
  • Required: Optional.
  • Type: Array of strings.
  • Values: Names like ["Ridley Scott cinematography"].
  • Example: ["Ridley Scott cinematography", "Syd Mead concept art", "digital painting techniques"].
Boosters for image quality.
  • Required: Optional.
  • Type: Array of strings.
  • Values: Terms like "masterpiece", "sharp focus".
  • Example: ["masterpiece", "best quality", "highly intricate", "sharp focus"].

Lighting

Defines light sources and effects for mood and realism.

Main light.
  • Required: Yes.
  • Values: Descriptive like "neon signs".
  • Example: "neon signs in vibrant colors (pink, blue, green, red)".
Additional lights.
  • Required: Optional.
  • Values: Any.
  • Example: "street lamps with warm orange glow, vehicle headlights cutting through rain".
Visual phenomena.
  • Required: Optional.
  • Type: Array of strings.
  • Values: "god rays", "reflections".
  • Example: ["god rays from billboards", "reflections on wet surfaces", "bloom and lens flare", "high contrast shadows"].
Light strength.
  • Required: Optional.
  • Values: "dramatic", "soft".
  • Example: "dramatic and moody".
Light angles.
  • Required: Optional.
  • Values: "multi-directional".
  • Example: "multi-directional from urban sources".

Camera

Simulates photographic settings.

Shot style.
  • Required: Yes.
  • Values: "cinematic", "portrait".
  • Example: "cinematic".
Lens type.
  • Required: Optional.
  • Values: "wide-angle 24mm", "telephoto".
  • Example: "wide-angle 24mm".
Zoom level.
  • Required: Optional.
  • Values: "short", "mid-range".
  • Example: "mid-range".
Viewpoint.
  • Required: Optional.
  • Values: "low-angle", "bird's eye".
  • Example: "low-angle shot looking up at skyscrapers".
Camera shot.
  • Required: Optional.
  • Values: "establishing_shot, wide_shot, long_shot, medium_shot, close_up, extreme_close_up, over_the_shoulder_shot, pov_shot, tilted_shot, tracking_shot, aerial_shot".
  • Example: "establishing_shot".
Focus blur.
  • Required: Optional.
  • Values: "shallow", "deep".
  • Example: "shallow, with foreground in sharp focus and background slightly blurred".
Layout rules.
  • Required: Optional.
  • Values: "rule of thirds".
  • Example: "rule of thirds, leading lines from streets to horizon".
How the scene is cropped.
  • Required: Optional.
  • Values: "full scene".
  • Example: "full scene with protagonist as focal point".

Mood

Emotional tone of the image.

Emotional tone of the image.
  • Required: Yes.
  • Values: Descriptive like "intense, dystopian".
  • Example: "intense, dystopian, mysterious, high-tech noir".

Color Palette

Controls colors for visual harmony.

Main hues.
  • Required: Optional.
  • Type: Array of strings.
  • Values: Color names like "neon cyan".
  • Example: ["neon cyan", "deep purple", "electric blue"].
Highlight colors.
  • Required: Optional.
  • Type: Array of strings.
  • Values: As above.
  • Example: ["fiery red", "acid green"].
Overall color feel.
  • Required: Optional.
  • Values: "cool", "warm".
  • Example: "cool and saturated with high vibrancy".
Light/dark difference.
  • Required: Optional.
  • Values: "high", "low".
  • Example: "high".

Composition

Guides overall layout.

Structure of elements.
  • Required: Yes.
  • Values: "layered depth".
  • Example: "layered depth with foreground, midground, and background elements".
Symmetry.
  • Required: Optional.
  • Values: "asymmetrical", "symmetrical".
  • Example: "asymmetrical for tension".
Key areas of attention.
  • Required: Optional.
  • Type: Array of strings.
  • Values: Descriptive.
  • Example: ["protagonist's face", "neon reflections", "distant skyline"].

Background

Details the backdrop.

Overall background.
  • Required: Yes.
  • Values: Any.
  • Example: "endless array of megastructures with vertical gardens, elevated highways, and floating billboards".
Specific elements.
  • Required: Yes.
  • Type: Array of strings.
  • Values: Descriptive.
  • Example: ["rain-slicked surfaces", "steam vents from grates", "distant thunderclouds"].
Background softening.
  • Required: Optional.
  • Values: "subtle Gaussian blur".
  • Example: As above.

Technical Specs

Output specifications.

Image size in pixels.
  • Required: Optional.
  • Values: "1024x1024", "4096x2160".
  • Example: "4096x2160".
Width:height ratio.
  • Required: Optional.
  • Values: "1:1", "16:9".
  • Example: "16:9".
Output type.
  • Required: Optional.
  • Values: "PNG", "JPEG".
  • Example: "PNG".
After-effects.
  • Required: Optional.
  • Type: Array of strings.
  • Values: "sharpen edges", "add film grain".
  • Example: ["sharpen edges", "add film grain", "color grading for teal-orange cinematic look"].

Subjects

An array of objects describing key elements (people, objects) in the scene. Each subject is an object.

An array of objects describing key elements (people, objects) in the scene. Each subject is an object.
  • Required: Yes, for populating the image.
  • Type: Array of objects.
  • Values: Multiple subjects; at least one recommended.

This array accepts one "primary", multiple "secondary" and one "background_elements" subject type entries.

Add Subject













Details











Subject Objects

Each subject object has:

Type*

Categorizes the subject (e.g., main focus vs. background).
  • Required: Yes.
  • Values: "primary", "secondary", "background_elements".
  • Examples: "primary", "secondary".

Description*

Detailed textual depiction.
  • Required: Yes.
  • Values: Any descriptive text.
  • Examples: "cyberpunk hacker protagonist, mid-20s Asian female with cybernetic enhancements...", "swarm of surveillance drones...".

Position*

Placement in the composition.
  • Required: Yes.
  • Values: "foreground center", "midground", "background".
  • Examples: "foreground center, standing on a wet sidewalk".

Pose*

Body position or action.
  • Required: Optional in some subjects.
  • Values: Descriptive like "dynamic", "sitting".
  • Example: "dynamic, looking over shoulder at approaching drone".

Expression

Facial emotion.
  • Required: Optional.
  • Values: "happy", "determined".
  • Example: "determined and vigilant".

Count

Number of instances.
  • Required: Optional.
  • Type: Integer or string like "multiple".
  • Values: 1+, or "multiple".
  • Examples: 5, "multiple".

Details (sub-object, optional):

Fine-grained attributes.
  • Type: Object.

Sub-parameters vary by subject, e.g.:
  • Hair: Description of hair (string, e.g., "short neon-pink bob").
  • Clothing: Outfit details (string).
  • Accessories: Items (string).
  • Size: For objects (string, e.g., "small quadcopters").
  • Features: Specific traits (string).
  • Diversity: Variety in elements (string).

Audio

Defines audio elements for video generation. It's particularly useful when the intent is "video_generation" (as updated in the meta section), allowing control over sounds, music, effects, and dialogue to create an immersive audiovisual experience.

Describes background noises that set the environment's atmosphere, enhancing immersion without overpowering other audio.
  • Required: Optional.
  • Values: Any comma-separated or descriptive text of sounds (e.g., "rain falling, wind howling"). Keep it concise; the model interprets and generates realistic audio clips. Avoid overly complex lists to prevent muddled output.
  • Example: rain falling, distant traffic hum, neon buzzing, footsteps on wet pavement.
Specifies the background music track or style, which plays throughout or in segments to match the mood.
  • Required: Optional.
  • Values: Any comma-separated or descriptive text of sounds (e.g., "rain falling, wind howling"). Keep it concise; the model interprets and generates realistic audio clips. Avoid overly complex lists to prevent muddled output.
  • Example: "synthwave soundtrack with pulsating bass and ethereal synths, moody and atmospheric".
Lists specific, event-triggered sounds that sync with visual actions (e.g., explosions, beeps).
  • Required: Optional.
  • Values: Comma-separated descriptions (e.g., "explosion, door creak"). These are tied to scene elements; the model times them based on animations or timestamps if provided elsewhere.
  • Example: "drone whirring, holographic interface beeps, thunder rumbles".
Controls the relative loudness of audio categories to balance the mix (e.g., make dialogue clearer than background noise).
  • Required: Optional, defaults to balanced (around 0.5-0.7).
  • Type: Object with key-value pairs.
  • Values: Keys correspond to audio categories (e.g., "ambient", "music"); values are floats from 0.0 (silent) to 1.0 (full volume). Add custom keys if needed (e.g., "narration").
  • Example: An object like: {"ambient": 0.6, "music": 0.4, "dialogue": 0.8, "effects": 0.7}
An array of objects describing spoken lines by subjects, including timing, voice style, and lip-sync. This enables narrative elements in the video.
  • Required: Optional.
  • Type: Array with two objects.
  • Values: An array (0+ items) where each object represents a dialogue instance. Order them chronologically for best syncing. If no dialogue, use an empty array [].
  • Example: An array with two objects.

Dialogue Objects

Each dialogue object has the following sub-parameters:

Subject*

Identifies which subject (from the "subjects" array) is speaking. Links audio to visuals for lip-sync.
  • Required: Yes (for each dialogue object).
  • Values: References like "primary", "secondary", or more descriptive secondary ids (e.g., "secondary_drone"). Must match a subject's "type" or be descriptive if not exact.
  • Example: "primary" or "secondary_drone".

Timestamp Seconds*

Specifies when the dialogue starts in the video timeline (for precise syncing).
  • Type: Float or integer.
  • Required: Yes, to avoid random placement.
  • Values: Non-negative number (e.g., 0.0 to video duration). Use decimals for sub-second precision. Should be less than or equal to "duration_seconds" in generation_parameters.
  • Example: 2.5 or 6.0.

Text*

The actual spoken words.
  • Required: Yes.
  • Values: Any dialogue text. Keep short (under 50 words per instance) for natural delivery. Supports multiple languages if the model handles them.
  • Example: "They're closing in... I need to hack the grid now." or "Target acquired. Initiating scan.".

Voice

Describes the voice characteristics for text-to-speech generation.
  • Required: Optional, defaults to neutral.
  • Values: Descriptive like "male, deep, authoritative" or "female, young, excited". Include accents (e.g., "British"), effects (e.g., "echoey", "robotic"), or age/gender qualifiers.
  • Example: "female, mid-20s, determined tone with slight echo from earpiece" or "robotic, modulated, emotionless".

Lip Sync

Enables automatic lip movement syncing for the subject (if it's a character with a visible mouth).
  • Type: Boolean.
  • Required: Optional, defaults to false.
  • Values: true or false. Set to true for humanoid subjects; false for non-speaking elements like robots without lips.
  • Example: true or false.

Negative Prompts

Specifies what to avoid in the image.

Undesired styles/objects.
  • Required: Optional.
  • Type: Array of strings.
  • Values: "cartoonish styles", "blurry".
  • Example: ["cartoonish styles", "low resolution", "blurry", "overexposed"].
Bad attributes.
  • Required: Optional.
  • Type: Array of strings.
  • Values: "poor anatomy", "muted colors".
  • Example: ["poor anatomy", "deformed faces", "extra limbs", "muted colors"].
GENERATE COPY DOWNLOAD IMPORT

Prompt


Click the generate button to get JSON prompt for Grok Imagine.

Log


Load Content



Save Content