Emerging UX patterns in generative AI experiences
Updated on: 45-0-0 0:0:0

This article is transferred with permission from: TCC Translation Information Bureau (ID: TCC-design)

Translator's Recommendation: This article provides an in-depth analysis of the evolution logic of generative AI user experience, from the command line to the graphical interface, to the intelligent interaction innovation of context bundling and user curation, and reveals how to lower the barrier to use by simplifying processes and trust design. This paper proposes the ecological concept of "AI collaboration canvas", which predicts the future work scenario of human-machine symbiosis, and provides designers and developers with key insights to reconstruct the human-machine collaboration paradigm.

Have you ever met the kind of barista who knows everything about your coffee preferences? You don't have to go through every detail to get an accurate picture of your ideal coffee – from temperature, extraction time, and water ratio, to where the beans come from, how thick they are grinded, and how well they are roasted. It's an enjoyable experience, but it's actually in line with the depth and direction of current AI exploration.

Rather than focusing on coffee, this article discusses the evolution and adaptation of user interaction, with the core centered on the historical development of graphical user interfaces and the cutting-edge dynamics in the interactive experience of generative AI. We will deeply analyze the key directions of AI user experience design, including contextual design, user-guided engagement strategies, the construction of trust mechanisms, and collaborative interaction models in the ecosystem, and explore the future development goals and practical paths of generative AI interaction.

From commands to conversations

Back in the early days of computer development, users needed to operate the device by entering precise instructions through the command line interface (CLI). This means that even simple tasks such as opening files or copying data need to rely on strictly formatted instructions. Imagine having to memorize the specific commands you need to access the Jobs folder every time, which is quite a threshold - obviously, this is not for the average user, only programmers can use it. In order for computers to serve a wider audience, change is imminent.

In 1964, ELIZA was born—one of the early experiments in natural language processing, based on keyword recognition and pre-programmed responses, capable of basic conversational interaction with users. HOWEVER, DESPITE ITS INNOVATIVE DESIGN AT THE TIME, ELIZA WAS LIMITED TO A VERY RIGID AND LIMITED INTERACTIVE EXPERIENCE.

During the same period, Xerox's Palo Altu Research Center (PARC) pioneered the concept of a graphical user interface (GUI), which was commercialized by Apple in 1984 and quickly rolled out to the mass market, followed by Microsoft. By introducing icons, drop-down menus, and a multi-window interface, this innovative interaction model revolutionizes the traditional human-computer interaction based on complex commands, while adding a new experience of easy mouse navigation. It not only improves the intuitiveness of user operation and reduces the difficulty of getting started, but also lays the cornerstone for the application of modern technology into daily life, bringing a new situation of "design-driven experience" to technical products.

Cases of different interfaces. Currently, ChatGPT's main form of interaction is text-based, how might it evolve in the future?

Take a look at the example image above. Today we are at an inflection point where we are witnessing a parallel evolution – user prompts have essentially become a kind of mini-program written in natural language, and the quality of the resulting output is highly dependent on our mastery of Prompt Engineering. Just as the early days of computing evolved from the complexity of command-line interfaces to rectangular windows and click-and-click GUIs, and the technology became truly ubiquitous, we are seeing the same signs in the field of generative AI: by integrating feature-rich complex logic, we are able to provide users with a more intuitive and fluid interactive experience, while the complexity of the underlying logic is cleverly hidden behind the user-friendly interface.

Stable Diffusion WebUI, Midjourney, and DALL· The E 3 user interface presents the graphical presentation of complex image diffusion models and the processing of prompts, showing a completely different design thinking and implementation

Image generation tools such as Stable Diffusion WebUI, MidJourney, and DALL· E 3), the result of the operation requires different precision of the prompt input. In comparison, MidJourney and DALL· The E 0 is more user-friendly due to its convenience, while the Stable Diffusion is more customizable and suitable for generating highly detailed visual outputs. However, with more insight into the needs of our users, it becomes easier to find the right balance between providing them with an intuitive experience while meeting their needs for specific details and personalization.

Contextual binding

By organically integrating relevant information into a single instruction, contextual binding greatly optimizes the human-computer interaction logic and accurately solves the pain points of traditional complex instruction transmission. This model can efficiently achieve a deep match between user intent and system understanding, greatly improving the convenience and delivery quality of the operation process, and at the same time eliminating the tedious steps of repeated adjustment or manual optimization prompts by users, and unlocking creative potential in an all-round way.

We've seen this trend seen in generative AI tools, such as example prompts in Edge, Google Chrome's tag manager, and specialized identifiers in Stable Diffusion that rely on trigger words. Through technical methods such as text inversion, LoRa, algorithm model optimization, or directional fine-tuning, these functions are being intelligently advanced and supported by performance.

In context-bound applications, "conversational" AI is not strictly equivalent to participating in a conversation. At its core, it's about focusing on what the user wants to achieve, rather than relying solely on textual prompts to interact. Contextual binding helps users get the output they need quickly without the need for a lengthy and cumbersome conversation flow through a path to the goal. The user experience is no longer limited to the traditional general-purpose conversational interface, but is more precisely designed to differentiate around specific data and specialized experiences.

For example, examples such as Miro Assist, Clay AI Formula Generator, and SCOPUS AI have greatly improved the ease and efficiency of the interaction process by consolidating relevant information into a single, highly focused functional module.

Another way to extend contextual bindings is to let the user customize the properties of these bindings pro-user. By incorporating user-adjustable preferences and personalization into context, it not only enhances the customization of the user experience, but also provides a more efficient and accurate connection solution for future product interactions.

Contextual binding isn't just about simplifying the conversation flow of human-computer interaction; What's more, it helps users reach their goals quickly, whether it's executing a search command, combing through key information, or completing a specific task. By turning lengthy and tedious operations into simple and intuitive interactive experiences, this design pattern is particularly suitable for linear or high-repetition tasks. But here's the question – how do you break the game when faced with open-ended exploration needs or vague goals that need to be optimized and iterated? At this time, a well-crafted user feedback mechanism and a closed-loop feedback process become the key bargaining chips.

User behavior management

While we've made a lot of progress in making AI interactive experiences more intuitive, there are still significant shortcomings in meeting users' diverse needs for refined output and achieving specific goals. This is especially true in the fields of scientific inquiry, creative collaboration, content creation, image optimization, and fine editing. With the enhancement of AI contextual understanding and the continuous development of multimodal technology, how to better guide and help users efficiently navigate increasingly complex interactions has become more important and indispensable.

Whether we realize it or not, as human beings, we are constantly optimizing the way we experience the world (as shown in the diagram). This optimization process can be manifested in the focus on capturing certain key semantics in the dialogue, or it can be reflected in the active annotation of information in the book text. While observing users brainstorming ChatGPT, I found a very similar behavior of content annotation. At that time, the user could not interact with the box information, but some of the elements would guide the next action. This shows that even if the initial content generation does not fully meet the needs of the user, it can empower the user to further process and decompose by providing a clear action anchor. Structuring output more efficiently and optimizing constructive feedback is a key entry point for improving human-machine synergy and professionalism of results.

示例包括 Clipdrop、ChatGPT、HeyPi、Google Circle 和 Github Copilot

As you can see, image inpainting, threaded conversations, and highlight interactions all represent an emerging trend that shows how users can strategically orchestrate specific modules of information to create a more contextual content experience while generating better results.

In the process of writing an in-depth research report, for example, the user journey often begins with extensive information exploration and gradually becomes clear that the core points of the analysis need to be focused. In the information collection and evaluation stage, users will gradually sort out and refine a large amount of data, and finally integrate it into the output of the results. In this process, identifying or tagging key nodes of specific content becomes an important fulcrum of the entire process, providing more accurate content association and context understanding for the AI system. At the same time, the process needs to ensure that users can efficiently store this weighted content and adapt it to subsequent use scenarios.

Users need to summarize and precipitate key content and optimize their user experience based on it. This requires deep insight into user outcomes and effective feedback mechanisms to collect and apply relevant information.

User curation research shows that in order for generative AI to effectively support complex creative tasks, it needs to not only have a deep understanding of user information interaction patterns, but also accurately predict how users interact with each other. By capturing and interpreting these "curation signals," AI tools can provide more targeted intelligent assistance to optimize the user experience and increase the value of the outcome.

Build reasonable trust

While generative AI has simplified the way users interact with technology, trust remains a central barrier to its mass adoption. This challenge existed in the past and remains unchanged today. Therefore, building trust has become a key issue to promote the implementation and large-scale application of new AI tools.

Among the many theoretical frameworks that provide insight into how users adopt and use new technologies, two are particularly noteworthy and instructive: the Unified Theory of Technology Acceptance and Use (UTAUT) and the Fogg Behavior Model (FBM).

According to the UTAUT model, users' willingness to use a tool or technology is driven by performance expectations, ease of use (effort expectations), group perceptions (social impact), and external support (facilitation conditions). For example, someone may try a customer management tool because they believe it will be effective in meeting sales goals (performance expectations), find their processes simple and efficient (effort expectations), recognize that the use of the tool has become a common trend among their peers in the industry (social impact), and that the tool integrates seamlessly into the company's existing database (facilitation conditions).

Parallel Theory (FBM) views behavior as a combination of motivation, ability, and triggering (cueing). For example, a user's behavior to buy coffee can be attributed to the need for caffeine (motivation-driven), the ability to pay (resources allow), and the visible surrounding coffee shop environment. Among them, the eye-catching coffee shop logo acts as an effective situational trigger to prompt the target behavior to occur.

Generative AI dramatically improves the efficiency of users in achieving their goals by effectively reducing the perceived cost of effort. In practice, many users have been able to break down barriers to procrastination with generative AI. However, the key to maximizing the user onboarding experience and improving the stickiness of continuous use is to build and strengthen the user's trust in the system.

In the context of trust mechanism design, there are many theoretical perspectives and frameworks, such as those mentioned above. However, here we try to further refine the core elements of trust: the user's past experience, the individual's risk tolerance, the stability and consistency of the interaction, and the underlying social field and contextual association.

Past experience: When a user is exposed to a new experience, existing cognitive preferences and accumulated experience often become their psychological background. This foundation of trust is even more important: instead of rebuilding and redesigning from scratch, it is better to adopt a familiar interface and operating mode that will help them build on their old trust and extend and optimize a new experience. This approach is obviously more efficient than doing the opposite thing entirely. In conversational AI applications, instead of directly asking users to fill in or enter prompts explicitly, it may be more efficient and less intrusive to make full use of the mimicry psychology in the conversation to gradually influence and shape user behavior through natural and fluent guidance and response.

Risk tolerance: Risk tolerance reflects the user's attitude towards avoiding negative consequences, and the key is to clarify what risks the user is comfortable with and the bottom line that they are not willing to compromise. Therefore, optimizing the design needs to ensure that the risk is reduced to the user's tolerance. Strategies that affect user risk tolerance include increasing the transparency of the interaction process, giving users more control, guiding users to actively authorize, and ensuring product compliance. In addition, the use of visual appeal and experience refinement through clever design can also effectively reduce users' psychological expectations of risk. Of course, each specific scenario requires a customized design approach, for example, in a conversational AI designed for the medical field, which requires extremely low fault tolerance due to its high-risk nature in medical diagnostic scenarios. In order to reduce potential risks, measures such as referring to authoritative literature, refining prompt interaction, and focusing on the interpretation of objections can be adopted to make the output more transparent and reliable, so as to enhance the trust of doctors and patients and improve the safety of the experience.

Interaction consistency: Interaction is not only the presentation of results, but also the operation path for users to achieve the desired output. Users should not be confused about whether different words, situations, or behaviors convey a consistent meaning. In order to improve the consistency of interaction, it is necessary to ensure systematic internal and external collaboration from the interface layout to the button copy. In the field dimension of conversational AI, interaction consistency can be reflected in a unified response format and consistency in the semantics of words throughout the conversation. For example, when a user requests an abstract of a topic, the system should not provide a paper presentation at one time and then turn into a bullet point list unless the user clearly expresses a special need.

Social environment: Social environment is one of the most significant dimensions that directly affects user trust. This environment may include buy-in from a leader or authoritative role in the organization, or it may come from a push to embed a trusted network, such as integration with enterprise software systems that are already trusted and widely used. In terms of realizing the value of social trust, we can use the strategy of social proof and build more social proof touchpoints in the user interaction context. In an internal database LLM scenario, this can be achieved by highlighting the successful work of users, especially their direct teams, on the system. At the same time, it emphasizes that the system can understand and process specific internal data, which not only strengthens the user's sense of trust, but also shows that the system has been fully verified and recognized in the existing trust environment.

What are the key dimensions to prioritize when designing AI experiences to build user trust? By deeply dissecting and optimizing these core elements of trust, you can effectively ensure that AI can meet the expectations and needs of users, thereby increasing user acceptance and promoting widespread product adoption. Building trust is not only a major enhancer in experience design, but also an indispensable key to the future adoption and deep integration of generative AI tools.

Contextual ecosystems

This article focuses on the new trend of contextual binding and user curation, and discusses how to build user trust through design strategies. Taken as a whole, generative AI has revolutionized the definition of productivity by significantly lowering the barrier to entry for the average user to initiate a task, much like the logic of the early enablement and evolution of graphical user interfaces (GUIs). However, the boundaries of contemporary user experience have long broken through the framework limitations of windows and pointers. So, where does the future evolution of generative AI go? This has undoubtedly sparked more expectations and conjectures.

The graphical user interface effectively improves the depth and efficiency of the user experience by supporting parallel operation of multiple interfaces. Users can easily switch seamlessly between multiple tasks—for example, working on financial statements in one app and designing presentations in another—greatly optimizing cross-task process management. This interaction mode gives full play to the synergistic effect of user intent, highlighting the ability to connect work efficiency and creative output in multiple application scenarios.

Emerging examples include Edge, Chrome, and Pixel Assistant, which deeply integrate AI capabilities and enable users to intelligently interact with these tools through generative AI. Under this framework, the large language model has the ability to understand the system function, and its application scenarios are expanded from the traditional dialogue window to the scope of more intelligent service design, breaking through the limitations of the original application framework.

Looking back, the graphical user interface provides users with a digital canvas that offers significant advantages over traditional physical environments in terms of efficiency, scalability, and productivity. Similarly, generative AI is expected to open a similar evolutionary path – AI will no longer be a tool, but will serve as a collaborative partner that will redefine everyday life as a shared and inclusive experience. Going forward, an enhanced ecosystem driven by generative and conversational AI may take shape, connecting diverse, vertically diverse intelligent agents in a seamless and fluid workflow. This ecological collaboration model will help users break the boundary between the digital and physical worlds, deepen the value of interaction, and realize the integrated delivery and efficiency reshaping of all scenarios.

The future of the future is not limited to conversational interactions or emotional companionship experiences, but generative AI will go deeper into content creators. Currently, users interact with AI-generated content, but the designer and ownership of the "canvas" lies more with AI. As human-centric AI products continue to mature, the next stage of evolution is to build a space where AI and users can collaborate in real time on the same creative canvas. For example, early tools like Grammarly have begun to explore this possibility, and new generative tools like GitHub Copilot have accelerated the process. We can think of generative AI as a co-creator in which the user ultimately controls and dominates the output of the work. At the same time, with the extension of technological boundaries and the improvement of user awareness, generative AI may further extend its capabilities, not only serving the digital workflow management of individuals, but also deeply intervening and even optimizing various physical fields of our daily life through the Internet of Things. In addition, augmented reality can redefine our existing lifestyle and productivity experiences, enhancing the seamless connection between people and technology.

The interaction of generative AI is repeating the trajectory of the evolution of human-computer interaction. When we optimize interaction scenarios, integrate context into a simplified operational experience, empower users with self-management, and seamlessly integrate into existing ecosystems. This not only makes generative AI more trustworthy, efficient, and easy to use, but also makes it accessible and inclusive, unlocking its potential for more users.

In Beijing, a pre-publicized gale
In Beijing, a pre-publicized gale
2025-04-14 13:48:54