Appscore is looking at how to leverage ARKit 3, the latest augmented reality technology, to better accommodate the one in six Australians who will at some point suffer hearing loss.

Hard-of-hearing people are required to overcome a variety of daily challenges to live normal lives. Currently, they use various forms of communication. These include widely-adopted Makaton (traditionally supporting those with speech difficulties), as well as Auslan. Auslan is a uniquely Australian sign language used by 20,000 Australians. However, this is of little use to people who don’t know how to read it.

The Challenge

Appscore’s master developers looked to explore the possibility of a brand new iOS augmented reality (AR) system called ARKit 3, which would create an innovative sign language app, looking to revolutionise how people without hearing can experience the world. The iPhone and iPad iOS app would serve to interpret Auslan and translate it into written word by videoing people as they sign.

We see this as a solution for those in the hospitality or service industry, who may interact with hard-of-hearing clients frequently, or anyone who wants to be able to understand sign language quickly. The translation tool could open the door for people who are deaf, facilitating activities such as booking a hotel room in person without the aid of a real-life translator.

Reading signs is a skill requiring phenomenal speed and understanding. When trying to recognise meaning in hand gestures, Auslan novices refer to pictures of hands and try to figure out what is being said. The method is limited, as it’s almost impossible to catch up with real movement without learned experience or current technology. Meanwhile, applying context to understand meaning adds another barrier.

The New Technology

Enter ARKit 3, the groundbreaking AR platform for iOS launched at the annual Apple Worldwide Developers Conference (WWDC) this June in San Jose, California. The state-of-the-art tool allows developers to integrate human movement and positioning into their apps simply by using the phone’s in-built camera app also makes it possible for  People Occlusion, where AR content appears alongside, in front and behind the human model for an immersive experience. Other features include multiple face tracking, collaborative session building plus a coaching UI for onboarding.

Until very recently, ARKit 3 has been released for beta testing only, accessible by developers alone. It has now been released as part of iOS 13, for use on iOS devices that can handle augmented reality. These include the iPhone XR or XS, as well as the 2018 iPad Pro, which all have A12 Bionic processors.

Appscore is looking to integrate ARKit 3 into an app that will transform how people connect with the world around them, through real-time motion capture capabilities. The tech’s motion capture functionality enables it to understand body position and movement, while Machine Learning facilitates translation for various purposes.

The Process

Our developers will leverage ARKit 3 in five stages to gradually improve the capabilities of the sign language app from basic to sophisticated. The process is as complex as sign language is; it involves more than just hands! When taking pictures of static signs, it’s difficult to determine the context in relation to the movements performed before and after. Our technology hopes to eventually, be able to do all of this in real-time.

This is where the second piece of tech we’re leveraging for the project comes in. Machine Learning for iOS will facilitate the training of the model to recognise different signs instantly. The complex piece of artificial intelligence technology will be able to relay letters, words and more to a high accuracy, through huge amounts of assimilated AR data from ARKit 3.

To capture the data in the first place, we’ll record a 3D model of somebody signing (hands first, then the rest of the body at a later stage). Once we fit the data into the app, Machine Learning will be able to tell us which words the signs translate to, plus details about the grammar, context, and semantics.

Phase One:

The first stage is to translate the letters of the alphabet. Signing letters is fairly static so this will be a straightforward process to lay down some groundwork.

Phase Two:

The next step is to build our vocabulary bank with simple-word signs that don’t require too many elements to perform; i.e. just using the hands.

Phase Three:

Adding signs involving use and touch of the face, mouth, ear, etc. These would be words that are more complex in terms of sign language, rather than the English language.

Phase Four:

Constructing simple sentences. We will take all the words we have translated so far and put them one after the other in consecutive streams, extracting meaning.

Phase Five: 

The final phase involves building an advanced system able to read nuances in Auslan. These include grammar, plus different meanings for the same sign, as well as context in relation to preceding and consequent signs.

The UI will involve seeing a visual feed of the person signing being recorded, plus text at the bottom processing translation in real-time. There may also be a small corner graphic of the sign if we determine it to be useful at a later stage, though we are yet to determine the final look.

Our developers will follow proper Agile methodology principles throughout the process. This means we’ll do the work in small pieces, so we know what’s possible and not. If something we try is not possible, at least we can fail early. We do this to optimise the entire project, improve efficiency and accuracy while minimising down-time.

A real-time sign language translator app is something that doesn’t exist on the global market at the moment, whether on the Apple store or beyond. What’s currently an exciting project for our team could eventually pave the way for a revolution in communication.

Follow our latest ventures and PoCs through regular updates on the Appscore blog!