Understanding and assessing AR, MR and VR headsets

Dirk Songuer
11 min readMay 19, 2023

Apple’s Worldwide Developer Conference 2023 (WWDC) is fast approaching, where (among other things) Apple might announce their VR-like headset.

Apple WWDC 2923 logo

However, how does one assess such a headset, its properties and maybe even its potential? With this article, I want to dive into distinct types of digital headsets, introduce some terminology and talk about challenges and limitations. The goal is to allow you to better assess the potential of individual solutions. This should be useful not just for the (rumored) Apple device, but also for the legion of similar clones that will undoubtedly follow.

First, the usual disclaimer: While the views in this article are mine, please be aware that I work for Microsoft around IoT and Mixed Reality things.

What are we talking about?

As the field of headsets is quite complex, I want to first set some boundaries for this article to keep it manageable.

Humans experience the world via their senses and in principle every one of those can be digitally augmented: Screens & glasses augment the visual sense, headphones the audible sense, gloves the haptic sense and so on. “Digitally augmented” in this context means that a user is provided with computer generated information alongside their natural perception of reality. While every type of augmentation is interesting, this article focuses on the visual and the audible sense.

One challenge when talking about such augmentation is that devices can have quite different approaches to augmentation. These can be bundled into three categories, or modes: Annotation, unification, and replacement. “Annotation” and “unification” are usually further lumped together under the term “Augmented Reality”, while replacement is usually called “Virtual Reality”.

Currently these approaches manifest themselves as device classes, as they deal with distinct technical challenges (see below). But really these are operational modes that achieve different things in different situations and scenarios. So, it’s important to note that no device class / mode is strictly better than others, they just have different optimal use cases.

I believe that in the future these device classes will converge, and one device will be able to utilize every mode, based on which provides the best user experience in each situation. However, until the technology has matured, they will remain separated.

Mode 1: Second screens aka Heads-up-Displays aka Annotated Reality

The simplest mode is showing information alongside physical reality. A simple scenario is a headset that puts a smartphone screen permanently inside the viewport of the user.

While the visual output is separated from the physical environment, the key is some form of contextual awareness. A system used for navigation might utilize GPS and a compass to know where it is and provide the relevant information about the physical environment on the screen (“You are here, and this is what’s around you”). That way, although the screen is separated from the environment, there is a contextual connection between both.

Mercedes-Benz EQS augmented-reality navigation system

Mode 2: Unified worlds aka Digital Windows aka Mixed Reality

Mixed Reality integrates digital aspects seamlessly into the physical environment. Instead of “screens in the air” the digital elements are integrated into and around physical objects. When done well the user is not put off by the amalgamation of digital and physical elements, and both feel equally natural and intuitive.

Such a system is also context aware, but it also has spatial awareness. For example, a navigation system would use environmental understanding to highlight the actual lane you need to take and other relevant information directly on the lane itself.

Dynamics 365 Guides using Azure Object Anchors

Mode 3: World replacement aka Virtual Reality

Virtual Reality completely replaces the perception of the physical environment with a digital representation.

This replacement might be something completely different than the actual environment, transporting the user to another place or world, usually in the context of games or entertainment.

However, the replacement can also show the user an artificially generated representation of the physical environment. This seems similar to Mixed Reality (mode 2), however it works the other way around: The user is not able to see the outside world, however the physical environment is recreated in a context specific way, matching the physical reality if required, and then displayed to the user.

Meta x BMW Research

Understanding the challenges for headset devices

Let’s start with hardware considerations. From a technical point of view headsets share the usual challenges of mobile devices: performance, comfort, and convenience.

The more performance a device has, the more energy it requires and the more heat the components produce. Thermal management makes the device bigger and bulkier (heat sinks and active cooling) while more batteries make a device heavier. However, compromising on any of those aspects leads to a sub-par device:

  • Compromising on thermals makes devices warmer. While fine for laptops, you do not want to wear something as hot as your laptop directly on your head
  • Compromising on batteries makes the devices inconvenient to use when the operating time gets shorter than the desired usage scenario
  • As does compromising on performance when the device gets too sluggish or unresponsive to use

Unfortunately, the technology used in headsets is so immature that EVERYTHING must be a compromise. There is no option to create an “uncompromising” device, no matter how much resources a manufacturer has, as the technology is simply not there yet.

Form factor considerations

One way around this is to externalize the problem: Put some of the components in an external enclosure or rely on another external system. Looking at the two competitors in the Mixed Reality category, Magic Leap decided to put all processing power into an external device called “Lightpack” that is connected to the headset (“Lightwear”) via cable while Microsoft decided to go for a fully self-contained device.

Left: Magic Leap 2— Right: Microsoft HoloLens 2

This way Magic Leap can use more powerful processors and bigger batteries as well as generate more heat because weight and heat are on the user's hip, not on their face. The drawback is that the user needs to be more mindful when handling the device (there are dangling cables involved) and the overall weight is higher (headset + external processing pack). Overall, this is not a bad solution, if these drawbacks are acceptable in the usage scenario.

The form factor can even be configurable, where third party manufacturers can take a device and change the shell to their requirements. An example would be the HoloLens 2 Customization Program, which allows partners to customize a HoloLens 2 device. Further abstracted are hardware platforms, where the solution might not be a device, but rather base technologies that can be used by device builders to create their own solutions quickly. An example would be the Qualcomm Snapdragon XR2 platform.

Software platform considerations

There is also the software side. Assuming a manufacturer has created a device, what software platform are they using? None of the established mobile operating systems (iOS, Android) are really optimized for headsets. Sure, they can run AR applications and offer respective SDKs on smartphones, however that is quite different than running a dedicated headset — especially for mode 2 and 3.

The supposed Apple device will obviously run a version of iOS. Android (ASOP) could be used as a basic platform for devices by other manufacturers, but both still require a custom spatial UI, dedicated interfaces (Hand tracking? Gestures? Voice?), an attractive development environment suited for spatial applications, security, remote management features and so on.

None of those things are mature for spatial use cases yet. In other words: Using a spatial device will neither be easy for developers, nor for customers.

Audience considerations

That brings us to the purpose of the headset: Does it have a clearly defined use case and target audience or is it supposed to be a truly general-purpose device?

Vusix smart glasses for example are clearly targeted at enterprise scenarios. Their entire website is about field service, remote assistance, manufacturing, or warehouse solutions. It’s a professional tool you use at work with no ambition to cater to mass market consumers or even early adopters.

Vusix homepage, May 2023

Microsoft HoloLens 2 does the same thing. Microsoft promotes it “For precise, efficient hands-free work”, and even includes an ROI calculator for organizations. While the branding around it is more fashionable and more like a consumer device, the targeting is clearly enterprise and business.

HoloLens homepage, May 2023

Magic Leap initially had a more general-purpose messaging but did pivot to an enterprise strategy as well.

Magic Leap homepage, May 2023

A clear target audience is important because with the hardware limitations as they are, it’s easier to develop devices around a clear use case. It’s also relevant as some use cases demand specific device considerations — for example safety certifications in certain enterprise environments like manufacturing or construction.

Remember that due to technological limitations every device must be a compromise right now. It’s easier to design a device for a specific use case where you can build around the challenges. But also, some compromises are more accepted — enterprise hardware usually being heavier and bulkier as they must meet safety regulations.

It’s important not to evaluate every device based on their “mass market appeal”, but according to the specific use case and scenario they were developed for.

Defining success

Defining success for augmented and virtual reality headsets can be hard. There are extraordinarily successful companies that focus on specific use cases, scenarios, or verticals, which are very profitable and add tremendous value for a narrow niche of customers. Tens of thousands of devices sold might represent a significant portion of that market already and make the solution profitable and a sustainable business.

However, if the aim is a truly general-purpose consumer devices, then every compromise is turning off yet another potential buyer group.

To even get to such a general-purpose & consumer grade device, it must be:

  • 1.500 USD or less, comparing it to super high end smartphone devices.
  • Ability to produce up to 15 million units per year, similar to the iPhone X in its first year as luxury, high-end smartphone
  • 6–8h of battery life to use it a full day, assuming it’s worn and used during commute, work and for relaxation
  • 200 grams / 7 oz or less, making them heavier than regular glasses (around 20 grams / 1 oz), but lighter than a bike helmet. This is already stretching it for full day use, but acceptable if the ergonomics are done right
  • They must be inclusive, for all kinds of head shapes, hairstyles, and personal accessories, as well as for users with impaired vision or hearing
  • Rugged enough to survive a fall or two plus the usual mishandling by friends, kids, pets and tossing it into a gym bag
  • Active developer community, including available variants of the most used smartphone apps, assuming that the goal of general purpose, consumer grade is to replace the smartphone for most use cases

That’s the point, isn’t it? The endgame is to build the next smartphone scale device and the only chance of doing so is to replace it. That is the ultimate success case, which is currently prohibited by the technical challenges.

To make a bold prediction, I believe that a true consumer device has to be mode 2. Mode 3 won’t be socially acceptable (as it completely replaces vision) and mode 1 is just too similar to a smartphone to warrant a large-scale switch to a new form factor.

In my view, the above is the minimum it would take to even have a chance at creating a large-scale consumer headset. But feel free to disagree and substitute your own criteria and thresholds for viable consumer headset devices.

Now, to be fair there is an interesting alternative where headsets will start as mode 1, augmenting the smartphone. In that case the smartphone becomes an external component of the glasses — a companion device. That was the original idea behind Google Glass. This strategy makes perfect sense if you’re an incumbent smartphone manufacturer: Start with a companion device featuring basic functionalities (similar to a smartwatch) that over time grows into a stand-alone product. This would not cannibalize their earnings and they can build everything on their existing platform, gradually extending it. The bad news is that you’d need to be Apple, Google, Samsung or maybe Huawei to pull this one off.

Again, if your use case is not “replacing the smartphone”, but a very defined vertical, success might be selling 10.000 units plus long-term service and operations contracts. These kinds of devices might look strange at first as they are usually highly optimized for that specific use case. Lugging a 3kg / 6 pounds device around might seem absurd until you are a fire fighter, and the thing needs to be able to take a beating and still work reliably in 300 °C environments.

Is it real?

This is the hardest one to answer. There are a lot of “future vision” videos and press releases that paint a very… optimistic picture of their solution. A prime example of this would be the original “one day” vision video by Google to promote Google Glass.

Google Glass Project — One day, Google, 2014

We learned very quickly that the actual Google Glass device is nowhere near that vision. I would argue that neither hardware nor software of Glass did anything to advance Google towards the vision shown in the video.

That said, I do believe it wasn’t Googles intent to mislead customers, developers, or investors. There is a need to create such visions to make the goals and ambitions of your solution tangible. However, it is sometimes hard to see behind the curtain and understand what is real (existing product) and what is not (vision).

If in doubt, get a hands-on demo. And make sure it’s not a prototype or engineering sample, but the actual production model.

Summary: Understanding device potential

The above gives us a good framework for understanding and assessing the potential of specific headsets:

  • What mode is it? Is it a second screen, unified world, or world replacement?
  • Is it self-contained or does it rely on other, external components? Is it customizable or a platform?
  • What software platform does it use? A new, self-made one, an adapted version of something existing or mature, common platform?
  • Does it have a specific use case or is it a general-purpose device? Likewise: Does it have a specific audience or is it fully consumer grade? Does it pass your threshold for viable consumer devices or the regulatory requirements for an enterprise device?
  • And the most important question: Does it actually exist?

Now, what are you interested in? Do you want to assess a device based on its potential to be a consumer grade, multi-million-unit seller? Or how well will it fit a specific enterprise use case? Analyze which manufacturer has the most forward-looking vision regardless of their ability to execute? Based on that, define what answers you would prefer to see for the respective categories and then assess the device(s).

While not perfect, this should offer a robust model that allows you to assess the potential of different headsets and even compare devices and solutions. Not just for the upcoming Apple event, but beyond.

--

--

Dirk Songuer

Living in Berlin / Germany, loving technology, society, good food, well designed games and this world in general. Views are mine, k?