Formal Usability Test Design

Test 1: "Am I On Track" UI Element

Background and Hypothesis

In our heuristic evaluation, we realized that progress bars were not helpful to our users in indicating how far they were in the process. It was unclear how certain tasks were weighted. We found that our users value knowing where they are in relation to where they should be, and that it was important to see a projected date on when they would finish all of their tasks.

Hypothesis #1 (Different Styles): If we include a UI forecaster element that communicates to users the difference between where they should be and where they are then they will feel more inclined to take action and less stressed out about the amount of work they have to do.

Hypothesis #2 (Different Placements): Depending on the placement of this UI forecaster, users will understand which categories need attention and not feel overwhelmed by the potential amount of work presented.

Hypothesis #3 (Color Breakdown): If we include a color breakdown of which categories need attention in the overall forecast bar, users will know which of the specific categories they are behind on and feel motivated to take action on them.

Independent Variables - What will vary?

We will vary three things about the “Am I on track” bar: the placement of the bar, the UI design of the bar, and the color breakdown for all of the categories that need attention.

For placement of the bar, the two options include: 1. A place that represents the progress on their overall wedding planning (on a page above all of the categories) 2. A place that represents the progress on their overall wedding planning (on a page above all of the categories) and a place that represents the progress on each category (on each category page). We are not placing the “Am I on track” bar on just the categories. This was in our initial design, but our heuristic evaluators provided us with the feedback that this was confusing, and instead prefered to see overall progress or overall progress and category progress.

For the UI design of the bar, we have two styles (seen below):

  1. Option 1 uses a half dome to represent their progress. On the dome, there would be two pointers, one marking the user’s progress and the other marking their progress relative to other users.
  2. Option 2 uses a timeline to represent their progress. On the timeline, there would again be two triangle pointers, one marking the user’s progress and the other marking.

Each of our wedding categories would have a unique color. For the color breakdown of all the categories that need attention, we have two options:

  1. The space between the pointer that marks the user’s progress and the pointer that marks their progress compared to other people would be colored according to which categories need attention.
  2. No color breakdown, just a red glow over the difference between the pointers when the user is behind.

Dependent Variables - What response will we see?

Hypothesis #1: Measuring stress and movivation level

  1. Do you feel stressed with your progress tracker?
    • Yes/No
  2. Do you understand the status of your progress based on the overall tracker?
    • Yes/No
  3. Do you feel motivated to complete your tasks based on your tracker?
    • Yes/No

Hypothesis #2: Measuring stress level

  1. Do you feel stressed by the progress tracker regarding the overall wedding status?
    • Yes/No
  2. Do you feel stressed by the progress tracker regarding the overall wedding status and regarding the status of each category?
    • Yes/No

Hypothesis #3: Measuring motivation level

  1. Do you understand which categories require action based on color?
    • Yes/No
  2. Do you feel motivated to take action on tasks based on these colors in the progress tracker?
    • Yes/No

Patterns in Response Variables

"Am I on track" UI element:

While someone can say they are “stressed” or “motivated” with very binary (yes/no) answers, we need a quantitative way to confirm these feelings.

To measure if someone is stressed:

We will be looking for metrics on how often a user lands at the progress bar page and exits out of the app or the page. Later, we can analyze time as a factor. For example, if someone immediately lands on the progress page and sees a color or style that makes them feel stressed they are more likely to exit out immediately.

To measure if a user is aware of their progress:

There is a difference between being aware you are behind and being motivated. A user could know they have a lot of tasks left to do, but not know how to catch up, or why it’s important they catch up. To measure awareness of progress on our two styles we will be looking if A) Users click on the portion of the half dome bar to see which categories they are behind on OR B) If they are presented with the horizontal progress bar we will observe whether or not they click on the running man icon which tells them information on how to catch up.

To measure if someone is motivated:

To measure whether or not people want to take action after seeing their progress we will be analyzing whether or not users click on the categories that require attention and the task completion rate.

Test Conditions

Ideally, we would test these prototypes with at least 8 people (and preferably many more) in order to understand what the best interface is for our application. Our team currently has 8 possible interactions for our forecasting element because of all of the different options we have, so we need at least 8 people to test them. If available, testing over 32 people would be ideal to see if there is majority of people who were originally presented the same interface that had similar reactions to the elements in question. Each participant will only go through one interface since going through multiple interfaces will introduce bias into the experiment. At the end of the test, after we have gone through the necessary questions to understand how the user responds to our response variables, we will present them with different design options to determine which of our other potential design resonate with them more visually, but we will not go through the entire usability study again.

Test 2: Enhanced Signup Flow

Background and Hypothesis

If our sign up flow is longer and includes more help text, then users will understand our definitions of a task, a category, how and when tasks are assigned, and what marking a task as done means.

Independent Variables - What will vary?

Internally, we have always evaluated our sign-up flow in terms of the burden placed on the user before they can see the value added by our app. Our current sign-up flow asks the user for their and their fiance’s names, their date and location, and what their availability is for completing tasks we assign them to plan their wedding. The results of our heuristic evaluation and our cognitive walkthroughs from the Design Refinement phase were that users were filling out this information without knowing that it could be changed later and users were not receiving any insight into the main interaction of the app—viewing, doing, and completing tasks.

We will be designing different sign-up flows that vary in terms of length and language used and presenting each flow we design to an individual user. We will keep constant the format of the sign-up flow and the set of interactions for moving through the sign-up flow. The user signs-up for this app in a desktop interface and is presented with a walkthrough where we solicit information from them. We do not intend to say that this walkthrough process is our final decision on the sign-up flow but we will keep it constant so as not to interfere with the dependent variables we expect to observe.

We do not have the requirement for this test that we test with actual users who have just started planning their own weddings. Instead, we must only ask a user who has never interacted with the app and doesn’t know our definitions of tasks, categories, etc. to complete the walkthrough. For each sign-up flow that we present to each individual user, we will try and standardize conditions, presenting it similarly to a cognitive walkthrough and observing the dependent variables after the walkthrough is completed.

Dependent Variables - What response will we see?

After completing one of the sign-up flows that we have designed, we will be asking the user the following questions:

  • What is a task?
  • What is a category?
  • At what times will you be assigned a task by the app?
  • What does marking a task as done mean?

We will record the user’s answers to these questions and compare them to our own answers and score how well the user did out of 5 points. We will be assigning partial points depending on how accurate and inline a user’s answers are with our expectations.

We will also be asking the user, on a more qualitative level, about how burdensome they found the sign-up flow. We can ask the user to compare our flow to other sign-up processes, if it felt long or short, if they would have liked to skip some steps in the flow, and if they were aware that any information we are soliciting from the user will be editable once the sign-up flow is completed.

Patterns in Response Variables

Including more help text and being more clear in our signup flow will help our users understand key definitions in our app - primarily understanding the meaning of a task, a category, how and when tasks are assigned, and what marking a task as done means. If we present users with a more complete sign-up flow with clearer language and useful graphics, we expect that the users will be correct more often and more confident in their answers.

Additionally, we presume that there may be a tipping point where the returns in knowledge by increasing the length of the sign-up flow diminishes, and the users will instead become frustrated by how long it is. If we see users getting frustrated with the length of the sign-up flow, we plan on noting this and considering it as a tradeoff to the familiarity potentially gained.

Test Conditions

Ideally, we hope to test 6 participants who will each go through 1 iteration of the test. Unfortunately, we are limited to 1 iteration for each participant, as we’re testing their familiarity and understanding of the system on a high level. A participant’s bias would clearly carry over from one iteration to the next, and we wouldn’t get useful data from asking the same questions. We do, however, plan on showing other possible iterations on the sign-up flow design (if available), and asking the participants about their thoughts on the differences and what components may be better or worse (to the best of their ability). It should be noted that our team has no prior experience in the design of controlled experiments, at least in the context of design usability tests.

Formal Usability Test Results

Test 1: "Am I On Track" UI Element

Overview of Observations

We interviewed two people. We showed them these two designs, the arch and the timeline:

Both of them, we observed, were stressed to be compared to average Americans planning their wedding. The first person we interviewed said “Oh f***, most users are here [pointing to where most users are] but I’m here [pointing to where he was in the planning process]”. We saw him become slightly competitive after that with his body language and his words. He was constantly saying he had to catch up, and kept leaning in. However, then he admitted that this was stressing him out. The second user was also stressed out by comparing them to most Americans. She was stressed out because she did not know where the number was coming from. Especially because hers said that she would be done planning her wedding multiple days after her wedding because she was behind most Americans. This stressed her out the most.

In our timeline design, we also included a running man icon when the user was behind to inform the user how to catch up. We need more data, but with the user we tested this prototype with said that the running man was misleading. At this point we saw him get really excited to click on the running man icon to catch up. Then we saw him collapse back into his chair when he saw the popup. He said he expected much, he expected “artifical intelligence” to tell him how to catch up.

Both of them felt anxious with the bright shade of red that was on the screen. For the first user, this made him stressed because it was the brightest on the screen and it felt like it was shouting. For the second user, it had more of a motivating effect. She knew what tasks needed work because of the bright color. Both of them said that both of our designs felt like a binary, either they were behind or they weren’t. That’s not realistic to humans trying to stay on track. They both recommended a gradient instead where how much on track they are. For example, a gradient from yellow to orange to red where yellow means something like “you’re a little behind but take on a task or two the next couple of times you want to receive tasks and you will be fine”. Orange would mean something like “you should take on a handful of tasks per time you want to receive tasks to catch up”. Red could mean something like “Warning, we need to give you a new schedule. You might even have to take tasks on days you did not want them.”

We also got feedback that the arch was a progress bar, and to the second user that means progress. It did not evoke a sense of urgency to complete things in the future, but rather excitement that she completed things in the past.

Response Variables Report

We asked yes/no binary questions to our users asking them how they felt in terms of comprehending the tasks that needed to be completed, stress level when seeing their progress and motivation to complete the tasks. Our users both felt that they understood that they were behind based on the colors, and they both felt stressed when looking at their progress. However, users felt more motivated to complete a task and indicated quantitatively that they felt less stress when they saw the progress bar design (Option B) to the arch (Option A). Furthermore, people felt more motivated to complete a task if there were progress bars on every category as opposed to exclusively seeing the overall progress bar.

The other behavior we observed was how quickly a user would exit out of the categories page as that indicated that they felt an overwhelming amount of stress such that they would want to leave. We found that on both tests when the individual categories did not have progress bars, people were quicker to exit out of the page.

Response Variables Interpretation

If we were to survey people and yield the same results that they felt stressed with the progress bar as opposed to the arc means that it is a more effective way of communicating progress to our users. This can also be indicated when users are more frequently exiting the arch category page to go to a different page as it illustrates that they felt so overwhelmed with how far behind they were they had no incentive to resolve this problem. Furthermore, the stress could also derive from our colors where our users indicated that they understood they were behind based on the colors.

In terms of key metrics, we observed that our users clicked on the “Categories that Need Attention” buttons much faster when there were progress bars on category buttons as opposed to exclusively only seeing the main progress bar. This timing tells us that the progress bars in each categories are effective in telling our users where they need to start taking action first.

Future Work

Based on what we observed in our tests sessions, there are a few design features of the tracker element that we feel comfortable moving forward with. To begin with, both of the people we spoke to prefered the linear tracker over the arch-style tracker because it made them feel less anxiety about the amount of work they needed to complete, so based on this, we will be moving forward with a linear style. Additionally, we got similar feedback regarding the style and content of the category cards that need attention, and based on this, we are planning to create a hybrid of what the two users experience that incorporates feedback from each of them. These “need attention” category cards will be red, which people felt indicated that it needed attention better than a red mark on a blue card. Also, each card will include its own tracker element so that users can see their status for every category, not just their overall status, which users felt was essential for them to be able to prioritize which categories they completed tasks in first. Additionally, both of our users felt that comparing their status against other users status was not as relevant to them as comparing their status to their own expected status, which is another change we plan to implement in the final design. An initial mock up of these new features can be seen below. We also got some feedback on the colors of the app from one of our users, however because this is such a small subset of people, we are not committing to changing the colors unless the team as a whole finds a new color scheme that we all believe is superior to our current color scheme and follows the feedback we received.

If we had unlimited time to work on improving and testing our designs, our first step would be to test with more users. The two people we did interview had some feedback that was identical and some that was directly contradictory, which makes it difficult to determine which design should be chosen to move forward with. Based on the feedback so far, we would like to continue testing different linear style progress trackers, as the consensus between the two people we spoke to was that this style was more clear and less stress-inducing than the arch style. However, since this is only our first iteration of the linear tracker rather than the linear progress bar we were using in earlier designs, in an ideal world we would continue testing different version of this tracker before finalizing it. Additionally, we would like to test more color schemes with more users. Since we were testing with such a small subset of people, we do not feel very comfortable overhauling the color scheme without showing it to at least some users. If we had more time, we would like to test a variety of color schemes with users to see which they had the largest positive reaction to.

Test 2: Enhanced Signup Flow

Overview of Observations

For this usability study, we were testing an enhanced sign-up flow which is longer and includes more help text to see if users understand how we defined tasks, categories, and marking a task as done and how and when tasks are assigned. We designed two of these flows that consist of the same form and two slightly different walkthroughs for the user. For each test participant, we only tested one flow as showing them the any other sign-up flows after the first one would add bias and give them additional information in answering the questions we posed about tasks, categories, marking tasks as done, and when tasks are assigned.

We showed our sign-up flow to two participants each. We then asked them the following 4 questions.

  1. We asked them to provide an example of a task if they were given a blank card, instead of the ones shown in the image below.
  2. We asked them to provide an example of a category other than Venue or Budget which we had already introduced as categories in this sign-up flow.
  3. We asked them when they expected our app to assign them tasks. The answer to this question was filled out by the test participant when they completed the form shown below before moving on to the walkthrough component we designed.
  4. We asked them what they expected to happen when a task is marked as done and where they expect to find this completed task in our app.

Users were able to answer these 4 questions successfully for both of the sign-up flows we designed. We encountered the case where users do not read the additional help text we added but found that the structure of the flow (form first, followed by walkthrough) was beneficial in conveying understanding and answering the questions successfully.

Response Variables Report

In our design, we indicated that we planned on studying the ability of our users to identify and accurately define key components and concepts in our app. When asked to identify an example of a task, all of our 4 testers accurately defined and gave us an example that fit in well with the definition of our system. When asked to give an additional example of what a category might be, all users provided examples of categories that fit well within our internal definitions. There was, however, some confusion as to whether subcategories existed, and this is something we plan on addressing. In response to the question ‘At what times and days will you be assigned a task’, all users recalled that they had supplied us with days that they were free as part of the sign-up flow. Finally, all 4 of the users were able to correctly identify where ‘Done’ tasks go, though we didn’t do a very effective job of describing of showing exactly where the archived list exists.

Response Variables Interpretation

If the results held for a larger user population, it would mean that our introduction and tutorial has done a sufficient job at introducing the key concepts and definitions. There is some underlying sentiment that some of the text is too long and verbose, but as the overall goal is to exhaustively ensure that the users understand what tasks and categories mean, I think we can cautiously say that with some minor adjustments and improvements, our sign-up flow is at least mildly effective.

It’s important to note that this pilot was not largely representative of the user population - none of our users were actually getting married, and we observed varying levels of interest and engagement with the sign-up flow and tutorial.

Future Work

One of our largest concerns is that our test participants skimmed over the help text we added in both of our sign-up flows. The test in its entirety took less than five minutes but we still had test participants who found our sign-up flow to burdensome and told us we were providing help text for things like the search bar and the calendar when they weren’t needed. Normal people don’t read help text so we need to encourage people to make the right decision and understand fundamental concepts like what a task is without relying solely on help text.

In the next week, we plan to implement a different style of walkthrough than the ones we have designed where we prompt the user to click into certain areas of the app before we start showing them help text. This simple action should identify where tasks and categories can be found even if the user doesn’t read any additional help text beyond that. We also plan on designing a Category page for our desktop interface which will more cohesively introduce the idea of categories and show the user the ones we have prepared for them as part of the walkthrough.

We would love to iterate upon this usability test design process and test with more sign-up flows in a more controlled environment. Ideally, we would be looking to find the correct amount of help text needed to both maintain the test participant’s level of understanding of the main interactions within our app and not feel burdensome or excessive. Unfortunately, we do not have time to pursue this so we are operating with the working knowledge that shorter is better when it comes to help text and that we should be designing a sign-up flow that, in the absence of reading any help text at all, still communicates an adequate level of understanding.