Project “Jarvis”: step one (proof of concept)

Adding Siri to both my old iPad 1 and iPhone 4 was a failure 🙁
Jailbreaking went nice, but messing up with SiriPort was a complete disaster, and it took me nearly 2 hours to turn back these devices into something different than a brick.

And thus … no SiriProxy for me. But then again, why should I mess with existing closed-source crap, when I can build my own stuff ? Hum ?

Project “Jarvis”

Here comes Project “Jarvis“. Ok, the name sucks… I shouldn’t watch these Marvel movies. And the logo is no more than a copy of Siri’s own logo, with a touch of Raspberry color. I’ll work on these later: now, it is time to proof check the ideas behind this project.

The principles are quite simple:

1 – A mobile App is used to record a simple question and send it to the Raspberry Pi
2 – The Raspberry Pi transforms the recorded voice into something understandable by Google’s Speech API and push the result to it
3 – Google Speech API returns back its voice-to-text interpretation as a JSON data structure
4 – The Raspberry Pi parses the data, builds something out of it and sends back its answer to the mobile App (and eventually to a Home Automation system)
5 – The mobile app prints out the answer to the question.
6 – Applauses and tears of joy

Proof of concept

First, let’s record a simple question. “Quelle heure est-il ?” (What time is it ?) will be a good start:

Then, let’s send it to the Rapberry Pi:

1	scp heure.caf root@applepie:/opt/jarvis

In order to get it interpreted by Google’s Speech API, one as to convert the record from Apple’s CAF (Core Audio Format) to the much more standard FLAC format:

1 2	apt-get install ffmpeg ffmpeg -i heure.caf heure.flac

Let’s send it to Google Speech API:

1
2
3

curl -i -X POST -H "Content-Type:audio/x-flac; rate=44100" -T heure.flac "https:
//www.google.com/speech-api/v1/recognize?xjerr=1&amp;client=chromium&amp;lang=fr-FR&amp;maxr
esults=10&amp;pfilter=0"

After a 1 or 2 seconds, I got the answer from Google:

HTTP/1.1 200 OK
Content-Type: application/json; charset=utf-8
Content-Disposition: attachment
Date: Sun, 10 Feb 2013 22:50:42 GMT
Expires: Sun, 10 Feb 2013 22:50:42 GMT
Cache-Control: private, max-age=0
X-Content-Type-Options: nosniff
X-Frame-Options: SAMEORIGIN
X-XSS-Protection: 1; mode=block
Server: GSE
Transfer-Encoding: chunked
 
{"status":0,"id":"f75093db420033490c2424cdb58de963-1","hypotheses":[{"utterance":"quel heure est il","confidence":0.61982137},{"utterance":"quelle heure est il"},{"utterance":"quel temps fait il"},{"utterance":"quelle heure est-il"},{"utterance":"quel temps va til"}]}

Not bad 😎

Polishing up

First, let’s write a few lines of PHP on the Rasperry Pi (see previous post for the details of the Nginx/PHP installation):

to trigger the ffmpeg conversion
to sent the converted FLAC record to Google’s speech-to-text engine
to get the JSON data structure back
to parse the XML result (a few regexps would do)
to send back a well thought answer to the question

Then, let’s fire up XCode, and with the help of the Core Audio API documentation, let’s write down a few lines of Objective-C:

Pretty cool for a 2-hours work 😎

Now what ?

I guess the proof of concept is conclusive 🙂

Now, the trick is that is not exactly fast. Almost … as slow as Siri.

The exchange with Google is the bottleneck. Also, I’d rather not depend on a private external API. I guess, one of the next step will be to see how would PocketSphinx fit into this project.

The CAF-to-FLAC convertion could also be done on the iOS side of the project. I’ll check out this project later: https://github.com/jhurt/FLACiOS.

Also, Jarvis is litterally speechless. Adding a few “text-to-wav” functionalities shouldn’t be too hard since espeak or festival are already packaged by Raspbian.

Then, of course, I’ll have to put a bit of thought into Jarvis’s brain (text analyzer) and hook the Raspberry Pi to some kind of Home Automation system.

And the iOS part needs a lot of looooove.

But I guess, that’s enough for a first step.

Project “Jarvis”: step one (proof of concept)

1 thought on “Project “Jarvis”: step one (proof of concept)”

Leave a Reply Cancel reply