[
    {
        "aspect": "Object Detection",
        "introduction": "Identifying the presence of individual objects within the image, regardless of their type or context.",
        "guidance": "When generating a prompt focused on object detection, clearly specify each individual object and ensure they are distinct and easily identifiable within the image. For example, if requesting an image with a cat, a ball, and a tree, ensure these elements are mentioned explicitly: \"An image showing a cat sitting next to a colorful ball under a large tree.\" Emphasize the separation and distinction of objects by indicating their positions or attributes: \"The cat should have a visible collar, the ball should be brightly colored, and the tree should have lush green leaves.\" To balance different elements within the image, ensure that no objects are obscured or visually neutralized by other elements: \"Place the cat in the foreground, the ball to its right, and the tree in the background to allow clear visibility of each object.\" Avoid common pitfalls by ensuring objects are not too blended or abstract, which can confuse their identification: \"Make each object\u2019s form and color sharply defined to avoid blending into the background or other objects.\" This level of specificity will ensure each object is individually detected and clearly presented within the generated image."
    },
    {
        "aspect": "Object Classification",
        "introduction": "Determining the category or type of each detected object (e.g., car, tree, person).",
        "guidance": "To accurately represent object classification in an image generation prompt, specify clear and distinct categories for each object and describe their visual characteristics. For instance, start by defining the primary objects to be included, such as cars, trees, and people. Outline the number and arrangement of each object type, and specify their locations relative to each other. Ensure that the attributes of each category are explicit, such as \"a red sports car parked on the right side of the road,\" \"two tall oak trees with lush green leaves standing beside the car,\" and \"a person walking towards the car carrying a shopping bag.\" Balance the scene by ensuring that objects are spaced naturally, avoiding clutter or unrealistic proximity. Emphasize the importance of distinguishing between object types by their roles and appearances in the scene to prevent misinterpretation, such as a car being mistaken for a tree or a person. This approach will help the AI generate a coherent image with correctly classified objects."
    },
    {
        "aspect": "Object Localization",
        "introduction": "Pinpointing the exact location of each detected object within the image using bounding boxes or pixel masks.",
        "guidance": "Clearly define each object's position within the image by using bounding boxes or pixel masks. For bounding boxes, specify coordinates or relative positions (e.g., \"a cat in the upper left quadrant, a tree in the bottom right corner\"). For pixel masks, illustrate shapes around objects that precisely follow their contours. Ensure objects are well-defined against their backgrounds, with distinct margins and consistent sizes. Accurately balance the spatial arrangement to avoid overlapping or crowded compositions. Emphasize clarity by avoiding ambiguous or overly complex scenes."
    },
    {
        "aspect": "Object Segmentation",
        "introduction": "Providing a more detailed boundary around each object by segmenting them from the background at the pixel level.",
        "guidance": "Clearly delineate each primary object within the image, ensuring their edges are sharply defined against the background. For each object, specify details such as shape, color, and texture to differentiate them precisely from their surroundings. Maintain consistent illumination and perspective across all objects to ensure seamless integration. Use high contrast between objects and the background to enhance segmentation clarity. Avoid blending or overlapping objects in a way that obscures their distinct boundaries. Example prompt: \"A vibrant butterfly hovering over a bed of colorful flowers, with each flower petal and butterfly wing meticulously outlined and segmented from the lush green background, emphasizing detailed textures and defined borders at the pixel level.\""
    },
    {
        "aspect": "Object Attribute Recognition",
        "introduction": "Identifying specific attributes of detected objects such as color, shape, size, or material.",
        "guidance": "When crafting an image generation prompt focusing on Object Attribute Recognition, it is crucial to detail the specific attributes of each object within the scene. For example, indicate the color (e.g., \"a red apple\"), shape (e.g., \"a round table\"), size (e.g., \"a large tree next to a small house\"), and material (e.g., \"a wooden chair\"). Describe these attributes in relation to each object to ensure clarity. Ensure that the attributes are distinct and easily recognizable in the image, avoiding vague descriptions like \"a beautiful flower\"; instead, specify \"a vibrant purple tulip with delicate petals.\" To maintain balance, be consistent with the level of detail across all objects, avoiding overly elaborate descriptions that could clutter the scene. Additionally, avoid conflicting attributes that may confuse the model, such as \"a small, towering building.\" This approach ensures that the generated image is coherent and accurately represents the specified attributes of each object."
    },
    {
        "aspect": "Multi-object Interaction",
        "introduction": "Analyzing and understanding how multiple objects within the image interact or relate to each other.",
        "guidance": "Ensure that the image prompt clearly specifies the types of objects involved and illustrates their interactions or relationships. Describe the primary objects, their actions, and how these actions correlate. Use dynamic verbs to convey interactions (e.g., \"holding,\" \"passing,\" \"avoiding\"). Indicate the spatial arrangement and proximity of each object to illustrate their relationship (e.g., \"a cat leaping over a sleeping dog\"). Highlight any shared features or common activities to establish a connection (e.g., \"a child offering a flower to a smiling grandmother\"). Provide context or a setting that supports the interaction, ensuring that the environment enhances the relationship between the objects without distracting from the main interaction. Be mindful of proportionality and ensure objects are sized and positioned to reflect realistic or purposeful interactions. Avoid overcrowding the scene with too many details that could obscure the primary interaction."
    },
    {
        "aspect": "Scene Classification",
        "introduction": "Determining the general context or type of scene depicted in the image (e.g., park, city street).",
        "guidance": "Clearly specify the primary setting or environment that characterizes the scene. For example, if depicting a park, include elements such as lush green lawns, trees, park benches, walking paths, and perhaps a pond or fountain. Balance these elements by placing the larger features like trees and lawns prominently and distributing smaller details like benches and paths throughout the scene. Additionally, define any human or animal activity that typically occurs in this setting, such as people jogging, children playing, or dogs being walked. Avoid mixing elements from vastly different contexts (e.g., city elements in a natural park) to maintain coherence and ensure the scene classification is unambiguous."
    },
    {
        "aspect": "Scene Parsing",
        "introduction": "Breaking down the scene into its constituent elements and understanding their spatial arrangement.",
        "guidance": "Describe the scene by identifying each major element and its specific position within the scene. Clearly outline the relationships and interactions between elements. For example, specify the location of objects such as \"a wooden cabin in the center, surrounded by tall pine trees on the left and right.\" Include spatial orientation details, such as \"a pathway winding from the bottom left corner to the cabin's front door.\" Ensure elements are distinct and not overcrowded by stating \"a clear blue sky overhead with a few scattered clouds,\" to avoid visual clutter. Use descriptors that detail the depth and perspective, like \"a flowing river in the background, behind the cabin and trees,\" to create a coherent image. Avoid ambiguity by providing specific placements and avoid overly general statements like \"some trees around.\" Prioritize clarity in spatial arrangement to ensure a well-balanced and integrated scene."
    },
    {
        "aspect": "Activity Recognition",
        "introduction": "Recognizing activities or events taking place within the scene, such as people playing sports or cars driving.",
        "guidance": "When generating a prompt that incorporates activity recognition, clearly specify the primary activity taking place, the participants involved, and any relevant objects or settings that enhance the context. For example, \"a group of children playing soccer on a grassy field\" explicitly describes both the activity (children playing soccer) and the setting (grassy field). Ensure that additional elements, such as equipment or environmental details, are mentioned in a way that they support but don\u2019t overshadow the main activity. Balance the image by placing the primary activity prominently in the midground, with secondary elements in the background or foreground as necessary to provide context. Avoid generic descriptions and ensure that the action verbs and details are specific to enable clear visual representation. For example, instead of \"a person doing sports,\" describe \"a woman performing yoga on a mat in a serene park.\""
    },
    {
        "aspect": "Environmental Context",
        "introduction": "Understanding the broader environmental context, such as identifying weather conditions or time of day.",
        "guidance": "Clearly specify the environmental context by detailing the weather conditions (e.g., sunny, rainy, foggy, snowy) and the time of day (e.g., dawn, morning, afternoon, twilight, night). Ensure to include visual cues that align with these conditions, such as lighting, shadows, and atmospheric effects. For example, describe the sun's position in the sky for daytime settings or the presence of stars and moonlight for nighttime. Mention any environmental elements like puddles for rain, snow-covered ground for snowy conditions, or fog enveloping objects for foggy settings. Balance the focus between the primary subject and the environmental context to create a coherent image\u2014avoid making the weather conditions or time of day too overpowering. Additionally, use adjectives that vividly describe these conditions to ensure clarity, and avoid ambiguous terms that might lead to misinterpretation."
    },
    {
        "aspect": "Spatial Relationships",
        "introduction": "Understanding the spatial relationships between different objects and elements within the scene.",
        "guidance": "When creating a prompt that emphasizes spatial relationships, clearly define the positioning of each object or element relative to one another. Specify locations using terms like \"to the left of,\" \"above,\" \"in front of,\" or \"beside.\" Ensure a coherent layout by describing the distances between objects, such as \"a few feet apart\" or \"touching.\" Balance the composition by considering the overall arrangement; for example, \"a large tree in the center with a picnic blanket directly below it, surrounded by scattered flowers to the left and right.\" Avoid vague descriptions and ensure that the spatial relationships enhance the story or aesthetic of the scene."
    },
    {
        "aspect": "Scene Dynamics",
        "introduction": "Interpreting the potential dynamics within the scene, such as predicting future movements or changes based on the current state.",
        "guidance": "Ensure the image captures a moment in action, showing the interactions or movements of subjects that suggest imminent or ongoing change. Depict elements such as flowing water, wind-blown trees, running animals, or people mid-gesture to indicate motion. Balance the image by placing dynamic elements in contrast with more static elements to highlight the intensity and direction of movement. Make use of visual cues like motion blur or dynamic lighting to emphasize the sense of activity. Avoid overcrowding the scene with too many moving parts which might confuse the viewer, and ensure the primary action is clear and central to the composition."
    },
    {
        "aspect": "Human Detection",
        "introduction": "Identifying the presence of human figures within the image.",
        "guidance": "To craft a prompt that effectively incorporates human detection, specify the number, positioning, and activity of human figures within the scene. Clearly describe their size and appearance, including clothing, facial expressions, and interactions with the surroundings. Balance these details by providing context on the environment they are situated in, ensuring it complements the human figures without overshadowing them. Use specific descriptors to avoid generalization, such as \"a single woman in a red dress standing in the center of a bustling city street,\" or \"a group of children playing in a sunlit park with trees and a playground in the background.\" Avoid vague terminology that might lead to misinterpretations, ensuring the human presence is prominent and easily identifiable within the image."
    },
    {
        "aspect": "Facial Recognition",
        "introduction": "Detecting and identifying individual faces within the image, including attributes like emotions and expressions.",
        "guidance": "Specify the total number of faces and their unique positions within the image, ensuring each face is clearly visible and distinguishable from others. Include detailed descriptions of facial features, age, gender, and distinctive attributes for each face. Clearly indicate the emotions or expressions each person should be displaying (e.g., a smiling woman, a surprised child, an angry man). Describe the overall context or setting, such as a social gathering, a classroom, or a public space, to ensure the faces appear in a coherent environment. Ensure faces are unobstructed by other elements and are well-lit to facilitate recognition. Avoid overly complex backgrounds that may distract from the focus on faces, and ensure diversity in the faces to reflect a wide range of human expressions and demographics."
    },
    {
        "aspect": "Pose Estimation",
        "introduction": "Estimating the pose or position of human figures, such as standing, sitting, or lying down.",
        "guidance": "Specify the exact pose or position of the human figure, describing the orientation and key body parts in detail. For example, if the figure is standing, mention whether they are standing straight, with one leg bent, or leaning against an object. If the figure is sitting, indicate the type of seat, posture, and the placement of hands and legs, such as sitting upright on a bench with hands on knees or leaning back on a sofa with legs crossed. Ensure that the described pose aligns with the overall scene's context to maintain coherence, such as a figure lying down on a beach towel with hands behind the head in a relaxed manner. Avoid ambiguous terms like \"in a dynamic position\" without further detail. Additionally, consider the figure's interaction with surrounding elements, ensuring these elements reflect and complement the described pose, such as a hand resting on a prop or an outstretched arm reaching toward an object."
    },
    {
        "aspect": "Gesture Recognition",
        "introduction": "Identifying specific gestures made by humans, such as waving or pointing.",
        "guidance": "To generate an image that accurately depicts gesture recognition, specify the precise gesture you want to be depicted, such as waving, pointing, or thumbs up. Clearly describe the position and orientation of the hand, fingers, and arm relative to the body. Include details about the setting and context to help integrate the gesture naturally into the scene. Define the subject's expression to align with the gesture, ensuring coherence between the gesture and the facial emotion. Mention the clothing and any accessories to be worn by the subject to give context to their gesture. Avoid ambiguity by detailing the action being performed, ensuring the gesture is clearly defined and understandable. For example, \"A person in a busy street market, standing and waving with their right hand raised high, smiling warmly while wearing a casual outfit.\""
    },
    {
        "aspect": "Clothing and Accessories",
        "introduction": "Recognizing and detailing the clothing and accessories worn by individuals, including their types and styles.",
        "guidance": "When crafting a prompt to include specific clothing and accessories, ensure that you provide detailed descriptions of each item worn by the individuals in the image. Specify the types of clothing (e.g., a flowing, red chiffon dress, a tailored black suit, a casual denim jacket) and accessories (e.g., a pearl necklace, a leather watch, aviator sunglasses). Mention the style or theme if relevant (e.g., vintage 1920s, modern streetwear, traditional Japanese attire). Balance the visual elements by distinguishing key clothing items from accessories and ensuring they are complementary. For multiple individuals, describe each person\u2019s attire to maintain coherence (e.g., \"A woman in a flowing, red chiffon dress with a pearl necklace, standing next to a man in a tailored black suit and leather watch\"). Avoid common pitfalls by clearly differentiating each accessory and piece of clothing, ensuring they are not overly generic, and maintaining a consistent style within the imagery."
    },
    {
        "aspect": "Character Interaction",
        "introduction": "Understanding interactions and relationships between different human figures within the image, such as conversations or social activities.",
        "guidance": "When crafting a prompt to illustrate character interaction, specify the actions and positions of each human figure relative to one another. For instance, describe one character speaking animatedly to another who is listening attentively, both standing close to each other in a bustling marketplace. Include details on their posture, facial expressions, gestures, and context indicating their relationship, like friends discussing weekend plans. Balance the characters such that they occupy prominent, central positions within the scene, ensuring their interaction is the focal point. Avoid vague descriptions that could result in disconnected or unrelated figures. Instead, emphasize the dynamics of their relationship through specific, vivid actions and reactions."
    },
    {
        "aspect": "Contextual Inference",
        "introduction": "Inferring additional context based on the objects and scene elements present, such as deducing activities or purposes.",
        "guidance": "When crafting a prompt for image generation, clearly specify the primary objects and scene elements, and include descriptive details that suggest their activities or purposes. For instance, if the scene includes a park with a picnic table, add elements such as a family preparing food, laying out a picnic blanket, and children playing nearby to convey the setting and its use. Use action verbs and specific objects to enhance context, like a person reading a book under a tree, a dog fetching a ball, or birds flying overhead. Balance the focus between the main activities and the background elements to maintain a coherent narrative. Avoid vague descriptions that do not provide enough context for the scene, such as \"a park with people,\" and instead elaborate on the interactions and purpose within the setting to create a vivid, integrated image."
    },
    {
        "aspect": "Cultural Understanding",
        "introduction": "Interpreting elements that may have cultural significance, such as traditional attire, monuments, or events.",
        "guidance": "To effectively incorporate cultural understanding into an image generation prompt, specify the cultural context by including traditional attire, well-known monuments, and significant cultural events native to the chosen culture. Clearly describe the traditional attire, including colors, patterns, and accessories unique to the culture. Identify prominent monuments or landmarks, ensuring their architectural details and surroundings are accurately represented. Outline any cultural events, detailing their activities, decorations, and typical settings. Balance these elements by positioning the traditional attire prominently in the foreground, the monument as a focal point in the background, and the cultural event activities and decorations harmonizing the scene around them. Avoid common pitfalls by ensuring accurate cultural representation, preventing stereotypes, and verifying the authenticity of all described elements."
    },
    {
        "aspect": "Emotion and Mood Detection",
        "introduction": "Analyzing the overall mood or emotional tone conveyed by the scene and characters.",
        "guidance": "Describe the emotional tone you want to convey in the scene clearly, specifying the desired mood, such as happiness, sadness, tension, or tranquility. Detail the expressions and body language of the characters to match this mood; for example, smiles, tears, tense shoulders, or relaxed postures. Include visual elements in the environment that enhance the mood, like bright, vibrant colors for happiness, gloomy, muted tones for sadness, or soft, peaceful pastels for tranquility. Consider elements such as weather conditions, lighting (bright sunlight for joy, dim shadows for melancholy), and props that can contribute to the overall emotional tone of the image. Balance the focus on characters' emotions with the environmental cues, ensuring they complement each other to create a harmonious scene. Avoid mixing conflicting emotional signals, such as smiling faces in a stormy setting, to prevent misinterpretation of the intended mood."
    },
    {
        "aspect": "Background Analysis",
        "introduction": "Recognizing and understanding the significance of background elements and how they contribute to the overall scene.",
        "guidance": "Direct the AI to identify the primary subject of the image and then thoughtfully choose background elements that enhance and complement this subject. Specify the background's setting\u2014whether natural, urban, abstract, or otherwise\u2014and describe how its elements interact with the foreground. For example, if the subject is a person in a bustling city, the background should feature recognizable urban details like buildings, busy streets, or shop windows to create a coherent environment. Emphasize maintaining a balance where the background is detailed enough to contextualize the scene but not so dominant that it distracts from the main subject. Avoid overly cluttered backgrounds that can confuse the viewer or dilute the focus on the primary subject. Use background elements to provide visual depth, context, and atmosphere that enrich the overall narrative of the image."
    },
    {
        "aspect": "Temporal Context",
        "introduction": "Inferring the temporal context, such as identifying if the image was taken during a specific season or historical period.",
        "guidance": "Clearly specify the time period or season that should be represented visually. For a specific historical period, include key elements such as clothing, architecture, technology, and cultural artifacts typical of that time. For a seasonal context, incorporate indicators like foliage, weather conditions, clothing, and activities associated with the season. Ensure elements are balanced so that the identified time period or season is prominent but not overwhelming; for instance, a winter scene could feature snow-covered landscapes, bare trees, and people in winter attire. To avoid misinterpretations, don't mix visual elements from different time periods or seasons in a way that could confuse the temporal context. Maintain consistency in all aspects of the visual representation to clearly convey the intended time frame or seasonal setting."
    },
    {
        "aspect": "Scene-specific Details",
        "introduction": "Identifying and understanding details specific to certain scenes, such as identifying specific locations, landmarks, or objects relevant to a particular setting.",
        "guidance": "When crafting a prompt to include scene-specific details, begin by clearly specifying the primary location or setting of the image. Include definitive and recognizable elements unique to that setting to enhance authenticity. For instance, if the scene is set in Paris, ensure to mention iconic landmarks such as the Eiffel Tower or the Seine River. Next, integrate objects or features typically found in that scene, like Parisian street lamps, caf\u00e9 tables with chairs, and cobblestone streets. Emphasize the spatial arrangement to maintain a coherent balance: primary landmarks in the background, secondary but crucial elements such as people or objects in the foreground, and supporting details filling the scene. Avoid overly generic descriptions that could apply to multiple locations; specificity is key. Ensure the elements do not conflict in style or period, maintaining a cohesive visual theme aligned with the identified location."
    }
]