Buttons (and more) in Mixed Reality

10 min readAug 26, 2019

Last week I was traveling. I also thought about buttons quite a bit. Before I go into details a quick disclosure: I work for Microsoft on Mixed Reality things and I will talk about some related topics. That said, all views are mine.

For StereoKit, Nick Klingensmith had hacked in some buttons to be able to spawn new items and clear the scene. Simple but effective.

But that got me thinking. I have worked with a number of AR/VR/MR toolkits, but as a developer I didn’t really like how most of them handled UI interactions.

From 2D planes to 3D spaces

In 2D space, interactions are relatively simple as everything happens on a flat plane: The screen. The user moves the mouse, the cursor moves on the plane respectively. The user clicks, something happens on the respective position. The user taps at a certain position on a touchscreen, it is mapped to the respective position of the plane. And while there are weird things like Force Touch, these are mostly just different types of clicks.

So when looking at how interaction are handled in 2D space, there are only so many interesting things that can happen: The mouse pointer enters of leaves the boundaries of the UI element and a button is pressed, held or released in relation to the element.

Yes, you can do some really amazing things with those two simple concepts. Microsoft’s Fluent Design System or Google’s Material Design use depth, motion, light and materials to make elements on the 2D plane behave 3D-like. But with mixed reality you actually are in 3D space and so are your UI elements. Users can walk around your UI. They can be too far or too close to reach or interact with them. They can be distracted or looking somewhere else. And don’t get me started about accessibility.

Hence there are frameworks, which include a lot of functionalities and components to help with application development. Most of them also include some form of user interface elements.

Microsoft has adopted a philosophy called instinctual interactions. This includes three primary interaction models that suit the majority of mixed reality experiences. Each of these interaction models is optimized for a set of customer needs. I am mostly interested in direct manipulation: Leveraging the power of hands, with which users are capable of touching and manipulating the holograms directly. In other words: You reach out and grab the thing.

MRTK — Mixed Reality Toolkit’s Hand Interaction Examples with HoloLens 2

Leap Motion has some excellent advice and experiments as well. They also discuss multiple interaction models and suggest teaching the users to progress from direct to metaphorical to abstract concepts.

Project North Star: Desk UI

In UX terms these interactions make sense. As a developer I’m not happy with how they are usually implemented — or rather represented in code. For example, let’s look at the Microsoft Mixed Reality Toolkit for Unity and which events the UI component for a button provides:

You can configure when the pressable button fires the OnClick event via the PhysicalPressEventRouter on the button. For example, you can set OnClick to fire when the button is first pressed, as opposed to be pressed and released, by setting Interactable On Click to Event On Press.
To leverage specific articulated hand input state information, you can use pressable buttons events — Touch Begin, Touch End, Button Pressed, Button Released. These events will not fire in response to air-tap, hand-ray, or eye inputs, however.

That’s fine I guess. It’s perfectly functional. It mimics how a 2D button behaves in terms of functionality and events. Buttons in MRTK are actually based on Interactables, which define what kind of controller(s) can interact with a UI element and provides a way to theme them. And that’s also a neat way of transferring how 2D components can work in 3D. But that’s my issue: It feels to me like we took an abstraction of the real world and abstracted it back into the real world. Like translating something into another language and then back again.

To be clear: I don’t know if there really is a better way. A lot of amazing people worked very hard on these toolkits. And they are very functional. I just have a vague feeling that from a developer point of view, there might be something else.

Wow, that was a really long excuse to sit down, have coffee and wildly speculate. Only to maybe come up with the same thing anyway. 🤔😅

Back to the playground!

The first thing I did was to grab and and play with some physical buttons: On my computer and other devices I had around me, on walls, in elevators, on old and new objects and I rummaged in my electronics drawer and got some physical buttons. Then I poked, pushed, pressed, squeezed, hit, slapped, tapped and fist-bumped all of them.

A selection of physical buttons I played around with

Most guidance around direct manipulation of holograms revolves around following the rules of the physical world. Even if a hologram obviously defies the laws of physics (by hovering in front of the user for example), users still expect it to behave a certain way when they “pick it out of the air” and turn it around in their hand. As a creator you might change the laws of physics (venturing into metaphorical or abstract interactions), but these laws should be consistent within your experience and thus interactions and their results should be repeatable.

So one assumption is that instead of using predefined, key frame based animations for UI elements I would like to use completely dynamic, physics based behavior. That way I could define the “rules of the world” and all elements would react consistently.

Another assumption is that I want to use an entity component system.

Based on those assumptions I came up with a couple of rules.

#1 The user should be able to touch and interact with everything.

In the physical world we can touch pretty much everything we see. The act of touching and experiencing things is actually very valuable. It helps us build a mental model of the world based on our previous experiences with similar objects. If you see a ball, stone, piece of wood, you will have a good idea about its weight, texture and other properties. You need a heavy object to prop open a window? Look around and you will see your mental model at work as you assess object in your surrounding environment.

Much like in the real world, touching and trying to interact with objects should be the default in mixed reality. Of course there is no haptic feedback (yet), so the properties of the object should be encoded in the visual and audio feedback: Can this object be moved? How easily can it be moved? Is there something attached to the object? What forces apply to it? Even things like the “temperature”.

In terms of development that means that every object (entity) has physical properties (components), all of which are interpreted by systems that provide visual and / or audio feedback whenever a user interacts with them.

#2 Mechanical interactions are just physics

One way how things “work” is with mechanical interactions. Open a door by using a door handle to know what I mean. Some pin is blocking the path of an object and once the pin is moved, then the object can move freely. Add properties like elasticity and you have a crossbow. Pretty much all behaviors can be build purely using a physics engine (performance aside).

But that also implies that there is a robust physical model of the world — or rather the augmented / digital parts of the world. How does gravity work? How are objects connected to each other? In technical terms the physics systems also act on the properties of the objects. If you are daring, you can also implement chemical interactions with this.

#3 Electrical interactions are magic

Mechanical interactions can usually be understood by looking at things. Electrical interactions are usually spatially disconnected, meaning the cause and effect might be in different locations. For example: Press the switch on the wall, the light bulb lights up on the ceiling. Of course you could trace the wires, but usually they are hidden. For all intents and purposes they can be treated as magic: The user does a thing and something happens somewhere.

So back to buttons. A push button makes an electrical connection when it is pressed. When the button is released the electrical connection is broken again. That can be combined with mechanical contraptions to create for example a switch.

In development terms “something happening” is an event that others can subscribe to. The light source would listen to the “light switch closed” event.

Coffee time!

What I want is UI elements to send a number of events, for example “button pressed” or “button released”. These events are triggered usually by mechanical movement, for example the button knob, which is pushed into the back plate. But instead of getting a finished UI element with geometry and visuals where I can maybe tweak some values like “button press distance”, I would prefer something more abstract that can be applied to different shapes and concepts.

As I was pressing buttons and sketching around, I was reminded of Trigger Volumes in Unreal Engine. And I was thinking of LEGO . So let’s say a specific component would provide the events, the triggers and the actor that can trigger them (where actor and triggers would work similar as in UE) and as an artist / developer I could place them where I want in my scene / on my objects (as with LEGO). That would detach the functionality from the visual representation and the entire principle could be used for pretty much every UI element.

This is what I came up with:

What it means:

Actor: A component that can be added to a mesh or anywhere in space. It’s essentially a bounding volume. It can have an assigned user defined type.

Action Trigger: A component that can be added to a mesh or anywhere in space. Also a bounding volume. Whenever an actor intersects with or leaves the trigger boundary, it sends an event. It can have assigned actor types that it reacts to.

Example: Simple button

The button knob can be moved along one direction, toward the back plate.

The knob is the actor, which is pushed by the user into the back plate with the attached trigger.

At the inner end of the knob is the actor (1). The knob is pushed away from the back plate by a spring.

The back plate is the action trigger (2), which is activated once the knob is pushed back and the actor boundary is intersecting with the back plate boundary.

Once they intersect, the trigger sets the isButtonPressed flag to true. This will also send the buttonStateChanged event. Once they part again, the value changes and thus the event is sent again.

Example: Shutter button

A Shutter button is similar to a simple one, except it has multiple press states. Half-pressed would tell the camera to focus, a full press would take a picture.

Same as above, only with another trigger added.

You should see where this is going: Simply add another action trigger half way (3), which would change the value of isHalfPressed.

That system is pretty good and flexible so far. Now we add another thing. Just like electrical sensors there might be simple binary or ranged variants. A binary variant only knows open / closed aka true / false, which is what we have above. Ranged variants send a continuous value. An example would be a dimmer, setting the brightness between 0 and 100%.

Each dimension of the action trigger volume has a min / max value, from 0.0 to 1.0 by default. If the actor moves through the action trigger, it would change the value and send a respective event.

Example: Slider

The action trigger extends from left to right as a component added to a rail (1). The left value is set to 0.0, the right one to 1.0.

The actor is the slider knob while the entire length of the object is a trigger, ranging from 0.0 to 1.0.

The actor is attached to a knob (2), which can slide along the rail.

As the user moves the knob along the rail, the actor intersects at a different location along the action trigger, causing the value to change, which in turn causes a sliderValueChanged event.

This can be extended into two or three dimensions with the same concept.

Example: Radial dial

Bend the rail. There is no spoon.

One point on the knob’s surface is the actor. The outer shell of the base plate is the trigger.

Another coffee and maybe some cake

I like this as a super simple way of building user interfaces. It steps away from having things (“This is what a button should be”) and allows to freely think what a user should do and then wire triggers to interactions.

You want a literal safety key for some interactions? Add the actor component to the key and some action triggers in a keyhole to then unlock UI options. A crank as dial? Analog style wiring connections? A joystick? A gearbox? All easily possible with this concept. Your 3D artists should be able to define the actor and trigger boundaries in their models that just need to be wired up.

As said, the concept is not new and your engine of choice will most likely already support it. Propably some of your UI elements will be similarly constructed. But I like the explicit way of “use this as an actor, use this as a trigger, then LEGO it” approach. It’s easily understandable and allows designers, artists and developers to easily construct interactions beyond the usual set of UI elements.

And that for me is the key: To allows UI components that mimic the real world more than the usual set of 2D components we got used to. And to hopefully go beyond the real world into more magical things.

Coffee and cake done. Me happy. 😊

Shout out to Nick for a lot of inspiration and conversations around this.