Injecting Intelligence – Building Apps using Microsoft Cognitive Services

Simon Jackson

Rate me:

5.00/5 (5 votes)

10 Jul 2016CPOL17 min read

9.6K

How to build apps using Microsoft Cognitive Services

In this article, you will learn to build apps using Microsoft Cognitive Services.

Technology seems to be moving ever faster and faster, especially in the world of Artificial Intelligence and Machine Learning. Barriers to entry are breaking down and huge cloud offerings from all the major suppliers are popping up left right and center. We see these on our devices (Siri, Cortana, Google Now), in our browsers yet to date, it’s being used mainly for one thing it seems, Ads (joy).

So with the skill requirements at an all-time low, just about anyone can dive in and start making use of all the “intelligence” offerings that are available and start building intelligent solutions for tomorrow.

Enter Re:Cognition

One of the big things that brought this to the forefront for me was the Re:Cognition event (hosted by Moov2 and Microsoft), which was a large hack-a-thon style event aiming to get teams crunching on Microsoft’s new “Cognitive Services” offering, which is an entire suite of APIs with a plethora of features to bring intelligence in to your apps & games. Best of all, these are all simply REST API endpoints, meaning they are available from anywhere and on any device so long as you have an internet connection.

The Event itself took place over a single weekend with the teams challenged to build what they could using Microsoft’s new Cognitive offering by whatever means necessary. Additionally, several other devices such as Microsoft Surfaces, Microsoft Kinect, Intel Realsense camera’s and even some snazzy Spheros were loaned out freely to use (but not keep sadly, however, there were prizes for the most notable projects).

You can see some of the highlights here:

Microsoft Cognitive Services

As you can see, there are a multitude of different services available and the majority of which are even testable on the Microsoft Cognitive site using a simple test page harness (remember, these are all just REST APIs after all).

Each API is specific in its focus, ranging from:

Emotion APIs which can sense what emotion a person’s face has from an image
Bing Speech API which will take some recorded audio and turn it into text (or vice versa) and deliver sentiment on the text
Linguistic analysis which can learn to react and respond to text input, providing a high level of interaction between the user and some intelligence
Search capabilities which can be customized and filtered to your services particular needs

There is so much more and a lot of content to dive through, HOWEVER, most can be learned in a few minutes and once you have learned one, it is even easier to learn the next.

All the above was crucial for the Re:Cognition event as the “hackers” only had 24 hours in which to create “something”, with most devs not even being aware of the services capabilities or usage at the beginning.

And that is not all, each and every service has a free tier, so there is NO COST involved in getting started with any of these services. Just sign up, get your access keys and off you go.

What Has Come Before

The start point for most teams after a short brief and teachings on “how to be a better parent”, were the example applications that Microsoft and some community developers have created since the services were announced (previously known as Project Oxford). You can check out the full list of demo apps (some of which went’ viral a short while ago) can be found here:

https://www.microsoft.com/cognitive-services/en-us/applications

And many many more! Check them out.

On Your Marks, Get Set, GO…

This was by and large a very brutal hack, mainly for the organizers and supporters (myself included) with the usual teething troubles of getting machines setup, devices connected and wires plugged. One thing that every team didn’t have any trouble with were the Cognitive services themselves, those were simply a breeze.

Here’s an example that you can walk through yourself as an example:

1: Get on the Services

Navigate to the Microsoft Cognitive Services site and pick a service, for example, the Computer Vision API:

2: Get Your Keys

If it’s orange, click it. Just click Get Started for Free. Click Let’s GO and sign in using your Microsoft account. From here, you can choose which APIs you want access to (select them all if you like ). For now, select the Computer Vision – Preview offering (shown below), click the mandatory I AGREE TO SELL MY SOUL (just kidding) Terms and conditions checkbox, decide if you want more information from MS (not mandatory) and then click Subscribe:

As you can see, it is a very generous window of opportunity with the API which is fantastic for developing with.

Just remember NOT to share your keys or you might hit trouble.

Once complete, you’ll be presented with the key management screen where you can grab your keys (you get two) and also view the usage of each service you have subscribed to.

3: Start Your Engines

Copy your key to your clipboard and let’s navigate over to the API documentation to see how they are used. Go back to the web address below and click on the API Reference button:

https://www.microsoft.com/cognitive-services/en-us/computer-vision-api

From here, you can see all the API documentation for each of the endpoints for the service you have selected, sample URLs, payload definitions and the all-important response packets full of interesting detail that the service returns:

You can peruse and learn how the service works, or (if you are like me) you can just jump in and see how it works. If you click on the Open API Testing Console button (which most APIs have, although a few are still in the works or are to complicated for such a simple test harness) you will be navigated to a very easy to use web interface for the API:

As you can see, this is a very simple interface, in fact, just to get going with the Computer Vision API, you need only enter two things! Your API key (which should be in your copy buffer) and the URL to an image to scan. Enter your key in to the nice red Ocp-Apim-Subscription-Key field first.

**Note the keys are not interchangeable between services. If you enter a bad key or the key from another service, then you will get an “Access unauthorized” message when you try to call the service. I lost count how many times that query came up on the night.

Next, you’ll need to enter your request body parameters, which is just a simple Json string in the field below:

The image I chose was for a very happy chap in the park on a sunny day, I wondered what the service would make of this (can’t tell you how many API calls I burnt through looking for an interesting Creative Commons image).

4: Checking Your Results

So off went the request, as I only asked it to specify categories for the image we supplied (be sure to check the API documentation for all the other parameters you can set to get more information about the content you submit), once it was complete, I got some interesting data back:

Nice, simple and easy to read and with the right JSON decoder, a very simple dataset to understand, but what is this telling us?

After the service analyzed the image, it found three categories of information about the image and it also tells us the degree of confidence it has in that discovery, so we have:

It has a 40% chance that there are people in this image, which seems fairly confident. If you were writing a security app, this becomes very useful.
It believes the shot is outdoors but with only a 0.39% chance that it’s right. But the fact it is there means it has some confidence.
It has also classified this has a 1% chance of being a general / others image. To be honest, I’m not sure what this means, you’d need to look it up.

Granted this was only to categorize the images, if you repeat the above steps but this time use the Describe image endpoint (top left part of the screen):

Then, we get a much more detailed about that Cognitive Services thinks about the image we are seeing.

As you can see, this is a lot more detailed and you even get the thoughts behind the services as to what you are looking at. Starts to get very spooky doesn’t it.

I like that the fact the guy is wearing a funny hat and that the API believes he is skiing. Close but no cigar.

Have a try with your own keys and various other images and see what you get back.

Back to the Competition

So, now you have a feel for what the teams had at their disposal, what did they come up with? I can tell you now that even my wildest dreams did not prepare me for some of the potentially world changing solutions the teams put together, even those who didn’t manage to finish. Well, we saw the beginnings of solutions, this was only 24 hours after all.

You can check out all the results on Moov2’s YouTube page here: lot’s more going on behind the scenes – https://www.youtube.com/user/Moov2com/videos

Be sure to also click on each of the images below to see the 3 minute presentation of each project.

The Sign Language Interpreter

One bold team of 4 stayed through most of the night (collapsing about 4am) had the bold dream of taking the vision set of APIs, mashing them together with a Microsoft Kinect to recognize a person doing sign language and then interpreting that into English. In the short time they had, they managed to reliably recognize three words with a high degree of confidence, the team also had plans to tie in the Speech APIs in order to also speak the results back, but just ran out of time. Given this was done end to end in less than 24 hours, is simply amazing:

Two Men With Beards and Too Much Time on Their Hands

One of the most riotous teams who got the audience really stoked in their presentation, built not just one but TWO projects. Their first app was a simple implementation of a “Beard or Not” system that could skillfully tell if the user in front of the camera had a beard or not (obviously very useful in some places). Given that this was “all too easy”, they then went on to make a game using the emotion API where the player had to express certain emotions for tiles that fell down lanes, the more you matched the higher your score (and it’s not as easy as it sounds). Best of all, this just used their laptop’s built in camera so I hope to see this come out soon! The crowd went wild on this one.

You don’t score if you’re not Angry you wouldn’t like me when I’m not angry.

Edward – The Dreamer

Just to show that you can still win big when you go it alone at hacks, Edward rolled forth to tackle one of the biggest issues in work life, preventing you from venting at colleagues over email. Using the Sentiment API, he built an entire Google plug-in for Gmail which checks your email before sending and if it feels you are being too aggressive or nasty, it will pop up and ask you to take five minutes / sleep on it before sending the mail for real. This could seriously be a life saver for me in my office at times.

Movember is Coming, Are You Ready?

Every year, it seems lots of gentlemen (and some ladies?) take the challenge to grow as much facial hair on your face as possible, however to date there hasn’t been an accurate way to automatically grade the level of foliage present on a person’s face, until now!

Yet another lone dev built a full web app where users can submit their images and have the cognitive services grade and rank a person’s follicle growth on their mush, the winner walking away with a nice cool beer. I have some suspicions through, as the current generation of the app could be fooled by having a dazzling lady holding a lanyard over your face in your pic.

Resisting the Temptations of Easy Money

Now if you have a sufficiently advanced computer, a camera and two wicked brains, you mind might turn to the first project that this next team turned to. To use the advanced OCR and language recognition features provided by cognitive services and then pair them up with the Bing search API’s to build a program able of cheating at the quiz style pub machines, instantly giving you the answer to any question from a choice of 4. The team swiftly dodged that likely unlawful bullet and then spent the rest of their hack coming up with a way to use the features they had already built to read children’s stories and extend them. This pair of knights deserve special recognition for their gallantry and turning away from temptation.

Your Personal Jarvis – Become Your Own Iron Man (Minus the Suit)

Home automation always seems like a “nice thing to have” except that most offerings and projects only go so far. However, this one man band had the grand ambition of paring up multiple devices and inputs, carefully interwoven with the cognitive services to produce a semi-intelligent home manager. Capable of detecting your mood, controlling the lighting / music and even entertaining your pet for you (with the addition of a Sphero). It can even construct its own speech letting you know it’s looking after you and offering suggestions to perk up your day. Future versions will also watch for your seating / walking position and offer handy tips to improve your general health. Not sure whether to be excited or very very scared. Watch and listen to the video carefully.

Your Food, Your Way, Your Choice

As one of the few teams who actually managed to go almost the entire night, this proud set of adventurers have obviously been to many parties where no one can decide what to order to eat. So they set forth their challenge to provide an interactive whiteboard to allow all people at a party to shout out that they would like to eat (recognizing individuals so you don’t get duplicates) and then ordering the most popular item. Almost acclaimed as the “Tinder for food” (which raised a few eyebrows), the team pulled off a working demo and then with minutes of sleep before the presentation showed their creation to the world. Certainly a fun team to work with.

If you also check the intro video for the event, these were the culprits for the famous Ballmer and Gates boogie video, which they left running on the Surface Hub they had in their room. I still have gory flashbacks to that night because of this, grrr.

Are You Chatting to Me?

Definitely going down as one of the teams with the most “on the spot” twitter prizes, mainly because every waking minute they spent on twitter. This team pulled together and even deployed an actual twitterbot which detects the emotion behind your tweet and then reply back to you accordingly. They used and tested this app throughout the night catching most of the support staff at apt times. They even managed to get a selfie with Bill Gates, but I must have missed him.

Face Lock for Apps

Are you paranoid in your office, do you regularly walk away from your computer and then find that your colleagues have walked up and then posted some unscrupulous photos of you on Facebook? Well then this team is here to save you. Their creation allows you to lock not just your computer but also specific apps by recognizing you in front of the camera, the difference being that it works with any camera and not just some uber powerful depth camera. The team hopes to extend this adding facial passwords through emotion tracking and even voice support. Certainly one to watch in the future.

Interacting Only When Present

One of the most ambitious teams sought to marry up two competing technologies together to answer a fairly difficult problem, only enabling access to a device when the right person is there. Sure you can hide this with passwords and stuff but it’s more achievable when there are multiple systems in play. The team wired up a single Kinect device (far left) to do long range positional detection with an Intel Realsense device for close up interaction, to ensure that only when a valid person is in view would the close access security system activate for use. The big win was they did this on a single machine, whereas traditionally multiple machines would be required, thus lowering the cost to entry. The team certainly has bold and ambitious plans to take this prototype forward. The fact they actually got it working in a demo environment is a testament to their technical skill.

Who Are You? – Game of Thrones

Another nice little fun project was done by yet another one-man band who simply loves Game of Thrones (but who doesn’t) and decided to build a fun web application using cognitive services to compare your face to a multitude of Game of Thrones characters from various seasons, letting you know who you most look like. This definitely brought on some cheers and laughter throughout the demo and one to watch for in the future as the catalogue of characters expands.

Yes Boris, you and Ned Stark certainly have a LOT in common!

Being Creative Helps

Do you often wonder what you could make for dinner with what you have in your kitchen? Well, this highly professional team with obviously far too much time on their hands for a hack, brought together the most polished presentation (including a sales video? seriously guys?) to showcase their new app Luigi, your personal chef.

Their app listens to what ingredients you have at your disposal and then informs you what dishes you can make with them plus instructions on how to make it, all with an authentic Italian / English accent. This is defiantly a video to watch and a well-honed team too look out for. I have no doubt that Luigi will be hitting our devices very soon.

The support team were quite set back about just how polished and professional the output of the team was, granted having a very experienced artist/designer on hand doesn’t hurt. But only one of the team managed to survive the entire night.

Closing Thoughts

I wasn’t quite sure what to expect from this event and like others amongst the event team, we were very interested to see what ideas came from this hack and we were still taken aback by all the ideas and creativity that resulted from the very short 24 hours (granted it was more like 40 hours since I survived the entire night as the night owl, mostly because I got too tired to sleep).

There were several more teams not listed here who were involved. Some didn’t manage to finish but still stood up to talk about their experiences, some didn’t quite reach their goals with cognitive services and others sadly didn’t return. One team I wished had come back had an interesting idea for an elderly watchdog / alarm system that would monitor an elderly person in their home, answer their commands and more importantly, be there when things went wrong and contact the emergency services automatically. It sounded amazing but sadly they hit a lot of technical issues early on which hampered their hackfest.

If you see / hear of another one of these events, I urge you to drop in and give it your all, the community spirit at this hack was truly amazing. Failing that, dig into the Cognitive Services provided by Microsoft and hack your own idea together, with all this easy to access machine learning, there are certainly some great opportunities to be had.

From What Comes Next

Well, the journey with Cognitive Services doesn’t stop here. I’ll be continuing this blog series with some investigations in to Cognitive Services of my own through various paths, from getting started with Unity on Cognitive Services to using the newly released .NET Core.

Time to get more hacking done!

This article was originally posted at http://darkgenesis.zenithmoon.com/injecting-intelligence-building-apps-using-microsoft-cognitive-services

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)

Written By

Simon Jackson

Architect ZenithMoon Studios

United Kingdom

Long-time game developer / IT maniac.
By day I architect, design, build and deliver enriching Mixed Reality solutions to clients, bringing the work of AR/VR to light in new and interesting ways, by night I Masquerade as the Master Chief of ZenithMoon Studios, my own game development studio.

At heart, I am a community developer breaking down lots of fun and curious technologies and bringing them to the masses.

I'm also a contributor to several open-source projects, most notably, the Reality Toolkit and all the services provided by the Reality Collective, The Unity-UI-Extensions project, as well as in the past the AdRotator advertising rotator project for Windows and Windows Phone.

Currently, I spend my time fulfilling contracts in the Mixed Reality space (primarily for an XR experience firm called Ethar), writing books, technically reviewing tons of material and continuing my long tradition of contributing to open-source development, as well as delivering talks, but that goes without saying Big Grin | :-D

Mixed Reality MVP, Xbox Ambassador, MS GameDevelopment Ambassador & Best selling author:

[Accelerating Unity Through Automation](https://www.amazon.co.uk/Accelerating-Unity-Through-Automation-Offloading/dp/1484295072/ref=rvi_sccl_3/262-0817396-1418043)
[Mastering Unity 2D Game Development] (https://www.packtpub.com/game-development/mastering-unity-2d-game-development)
[Unity 3D UI Essentials] (https://www.packtpub.com/game-development/unity-3d-gui-essentials)