Generative AI tools have revolutionized the world with their astonishing capabilities. They now possess the power to craft visually stunning art, engage in natural conversations, and even write complex computer code. In the past year, these tools have undergone rapid evolution, producing outputs of unprecedented quality, as evidenced by their awe-inspiring creations. In just a year they have gone from a simple experiment producing fuzzy often nonsensical images to a tool that can produce photorealistic imagery with beautiful compositional quality to it.
Although these AI tools have their limitations, the benefits they offer in terms of productivity and output quality are immense. We wanted to embrace the prowess of MidjourneyAI to assist us in generating evocative imagery that supported our user journey pages.
From its fun visuals to its interesting compositions, it's evident that Midjourney's essence lies in its ability to convey a story beyond words.
For our website work, Midjourney was used instead of other competing options like DallE2 or Stable Diffusion because it had a few key advantages. First, nearly every image it generates is tuned to go beyond what the user asks for and be well composed and visually appealing - as opposed to being exactly what was asked for.
Second, Midjourney outputs tend to be visually coherent and logical, with fewer floating artefacts of the image generation process that look out of place.
Crafting an AI Image
Let’s dive into what we did.
The process of generating an image from Midjourney seems quite simple at first glance – give it some text and receive an image as an output. What adds considerable complexity and time to this process however are the nuanced ways of interacting with the AI system to get it to give you the exact outputs you desire.
• The Ghost In the Machine – Midjourney’s AI is trained on large data sets associating language with images in a very fuzzy way. It seems to imbibe an intrinsic understanding of concepts like “underwater” and “world” – resulting in imagery, colours and textures related to this concept appearing in the results without being asked.
Example – The prompt “Underwater worlds” results in far more than just what the words literally mean.
• Language Matters – The text used in the prompt is the key as it tells the AI what sorts of imagery to pull up and interpolate between to create the resulting output. The prompts “Isometric 3D model of a city” vs “3D model of a city” will generate quite different outputs.
• Using Images – Images can be a strong form of assistance. The tool will try to understand the concepts present in the image and then combine this with your prompt to generate a new image.
• Using modifiers – These can control specifics of the image generation process
- - iw 0.5 : Image Weight is a parameter that modifies how strongly the image that is included in the prompt influences the result. It varies between 0 and 2.
- - stylize 500 : This parameter influences how strongly the AI’s pretrained artistic sensibilities of colouring and composition influence the result.
• An image collage as an Input – One interesting mechanism of communicating with the AI is through showing it a collage of images that we stitched together. They say a picture is worth a thousand words, which turned out to be true here! In this case we wanted to combine infographics with an isometric home model so we included an image of each of these along with the text prompt.
• Consistent Style – To maintain a consistent style across images we found the the phrase “Digital teal aesthetic on black background” gave reasonably consistent results aligned to our brand. So most prompts had this added to the end.
“A person making a choice between 3 options. Digital teal aesthetic on black background”
• Manual Finetuning – Once we had an image we liked, in some cases we had to use photoshop to manually touch it up. We added in text and got rid of incoherent details. The AI currently does not handle text within the images well and we operate in an industry where intent is key. So we wanted to avoid any words that could add confusion.
• Controlling the Output – Getting the details right can be hard with this tool. It is very hard to edit specific details of an image or generate images based on preset object positions within the template images. There are tools that excel at control (like Stable Diffusion with Control Net) but these typically lack the artistic ability present in Midjourney’s AI.
A “craft” is typically an activity that requires large amounts of knowledge and skill accumulated over years of practice. Creating digital media is undeniably still a craft, one that we have augmented considerably by using AI. For our team to generate 30+ images aligned to relatively esoteric concepts such as ‘automation platform’ would have taken us considerable time and effort.
By combining skills in writing the narrative, prompting the AI, editing the images and branding the content, we found considerable time saved utilising Midjourney and iterating ideas really quickly.
We coupled the image generation with some time spent with Google Bard to help with some of our website copy. This helped fine tune our content and embed some core marketing principles to how we write and display our work.
There have been criticisms of generative AI such as – copyright claims, job loss, AI being incapable of true creativity and more. With any new technology or action there will be negatives and it can only be judged on balance and over time. The responsibility then lies equally on the user of this tool to use it ethically and in a productive manner. In this case Midjourney provides a massive boost to the creative process of crafting images that are custom to our company brand.
The positives here were quite significant for us – where our aesthetic sensibilities are magnified, where we realise there is no upper ceiling on the work that can be produced. As a result, we were able to move onto tasks that focused on our platform making that as great as possible.