Project “Jarvis”: step one (proof of concept)

Adding Siri to both my old iPad 1 and iPhone 4 was a failure :(
Jailbreaking went nice, but messing up with SiriPort was a complete disaster, and it took me nearly 2 hours to turn back these devices into something different than a brick.

And thus … no SiriProxy for me. But then again, why should I mess with existing closed-source crap, when I can build my own stuff ? Hum ?

Project “Jarvis”

Here comes Project “Jarvis“. Ok, the name sucks… I shouldn’t watch these Marvel movies. And the logo is no more than a copy of Siri’s own logo, with a touch of Raspberry color. I’ll work on these later: now, it is time to proof check the ideas behind this project.

The principles are quite simple:

  • 1 – A mobile App is used to record a simple question and send it to the Raspberry Pi
  • 2 – The Raspberry Pi transforms the recorded voice into something understandable by Google’s Speech API and push the result to it
  • 3 – Google Speech API returns back its voice-to-text interpretation as a JSON data structure
  • 4 – The Raspberry Pi parses the data, builds something out of it and sends back its answer to the mobile App (and eventually to a Home Automation system)
  • 5 – The mobile app prints out the answer to the question.
  • 6 – Applauses and tears of joy

Proof of concept

First, let’s record a simple question. “Quelle heure est-il ?” (What time is it ?) will be a good start:

Then, let’s send it to the Rapberry Pi:

scp heure.caf root@applepie:/opt/jarvis

In order to get it interpreted by Google’s Speech API, one as to convert the record from Apple’s CAF (Core Audio Format) to the much more standard FLAC format:

apt-get install ffmpeg
ffmpeg -i heure.caf heure.flac

Let’s send it to Google Speech API:

curl -i -X POST -H "Content-Type:audio/x-flac; rate=44100" -T heure.flac "https:

After a 1 or 2 seconds, I got the answer from Google:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Disposition: attachment
Date: Sun, 10 Feb 2013 22:50:42 GMT
Expires: Sun, 10 Feb 2013 22:50:42 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Transfer-Encoding: chunked
{"status":0,"id":"f75093db420033490c2424cdb58de963-1","hypotheses":[{"utterance":"quel heure est il","confidence":0.61982137},{"utterance":"quelle heure est il"},{"utterance":"quel temps fait il"},{"utterance":"quelle heure est-il"},{"utterance":"quel temps va til"}]}

Not bad 😎
Polishing up

First, let’s write a few lines of PHP on the Rasperry Pi (see previous post for the details of the Nginx/PHP installation):

  • to trigger the ffmpeg conversion
  • to sent the converted FLAC record to Google’s speech-to-text engine
  • to get the JSON data structure back
  • to parse the XML result (a few regexps would do)
  • to send back a well thought answer to the question

Then, let’s fire up XCode, and with the help of the Core Audio API documentation, let’s write down a few lines of Objective-C:

Pretty cool for a 2-hours work 😎

Now what ?

I guess the proof of concept is conclusive :)

Now, the trick is that is not exactly fast. Almost … as slow as Siri.

The exchange with Google is the bottleneck. Also, I’d rather not depend on a private external API. I guess, one of the next step will be to see how would PocketSphinx fit into this project.

The CAF-to-FLAC convertion could also be done on the iOS side of the project. I’ll check out this project later:

Also, Jarvis is litterally speechless. Adding a few “text-to-wav” functionalities shouldn’t be too hard since espeak or festival are already packaged by Raspbian.

Then, of course, I’ll have to put a bit of thought into Jarvis’s brain (text analyzer) and hook the Raspberry Pi to some kind of Home Automation system.

And the iOS part needs a lot of looooove.

But I guess, that’s enough for a first step.

One thought on “Project “Jarvis”: step one (proof of concept)

Leave a Reply

Your email address will not be published. Required fields are marked *