Project “Jarvis”: step five (look at me)
| Tonight, I’ll go for a very simple hack: connect a webcam, detect motions and stream a live feed over HTTP. Not sure how it’ll fit with Project “Jarvis”, but who knows … | ![]() |
Hardware
First things first: the hardware. For some reasons, the only webcam that I had was an old Apple iSight. You know, the old firewire one… Since there is no IEEE 1394A port on the Raspberry Pi, I had to buy a new one.
I settled for a Hewlett-Packard HD-2300 USB webcam. I took a chance, since it was not on hardware compatibility list, but it was available, reasonably priced for its category, and didn’t look like too bad (I know, it is silly, but it is actually one of my buying criteria):

Nevertheless, it appeared right away:
root@applepie /etc/motion # lsusb Bus 001 Device 002: ID 0424:9512 Standard Microsystems Corp. Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub Bus 001 Device 003: ID 0424:ec00 Standard Microsystems Corp. Bus 001 Device 004: ID 050d:1102 Belkin Components F7D1102 N150/Surf Micro Wireless Adapter v1000 [Realtek RTL8188CUS] Bus 001 Device 005: ID 050d:0234 Belkin Components F5U234 USB 2.0 4-Port Hub Bus 001 Device 006: ID 04b8:0007 Seiko Epson Corp. Printer Bus 001 Device 007: ID 03f0:e207 Hewlett-Packardroot@applepie /etc/motion # |
A subsequent “lsusb -v” command gave my nice details about the webcam. Cool
Setting up “Motion”
Now, the software part. Motion is a nice piece of Open Source software that (among other thing):
- Does motion detection (and optionnaly record video and/or frames whenever a motion is detected)
- Takes timed snapshots regardless of motion detection.
- Lives video IP stream in MJPEG format.
The installation was pretty easy:
1 | apt-get install motion |
I only changed a a few settings. Namely:
- I turned “start_motion_daemon” to “yes” in /etc/default/motion to enable the daemon
- I switched “location” to “on” in /etc/motion/motion.conf to turn motion detection on
- I switched “webcam_localhost” to “off” in /etc/motion/motion.conf to enable access to live streams from anywhere on the LAN
- I tweaked the “text_left” setting to let the message “ApplePie” appear on the bottom left of the streams
I left all the other setting as they were (including frame rates and resolution, as it was only a matter of test), and fired up the daemon:
1 | /etc/init.d/motion start |
I set the Firefox’s URL to http://applepie:8081 (as 8081 is the default port used by motion), pointed the webcam at my “Forbidden Planet” poster and shook the cam a bit to simulate a motion:

Not bad
It thought that Robby the robot moved. Now, that’s kinda cool
I guess, that’s it for tonight (and was quite easy, as it was widely documented on many other web sites).
Note: for some reason, I have not received yet a password for my requested account on elinux.org wiki, even though I got the e-mail address confirmation message and activated the account. I guess I’ll update the Raspberry Pi compatibility list later.
Published by: Fred on February 20th, 2013 | Filed under Free Software, Raspberry Pi
2 Comments »
Project “Jarvis”: step four (GUI)
| Here comes another step in the conception on project “Jarvis”. For once, let’s not be too geeky. I feel artsy today. I’ll focus on the GUI and fire up Inskcape et the Gimp. Yeah, still a bit geeky, I know… | ![]() |
Jarvis-like GUI
Let’s try the first idea take comes to my mind: create an interface … just like Marvel’s Jarvis. I googled a bit, watched a couple of scenes from the movies and came up with a first rendering:

And it sucks
!
Yes, Jarvis on the movies looks badass. But as a GUI, no matter how hard I tried, there no way it can be usefull to anything at all.
Siri-like GUI
The second idea would be a Siri-like interface. Just like I drew in the first scketches. Here’s a first rendering:

And … that sucks too
! It is a just a boring copy of Apple’s SIRI
…
Plus, I want something dead simple, and that would provides natural ways to navigate between Wolfram|Alpha’s pods.
Haze-like GUI
Haze is a fantastic weather forcecast app for iOS, with a very unique interface. Almost peotic. That’s definitly what I want for project Jarvis
!
Here’s a sketch of what I have in mind:

I feel, the GUI should be as minimalistic and intuitive as possible:
- A simple button to trigger voice acquisition
- A smaller button to rise a keyboard and enter a query
- A third button to share the results
- On top of these buttons, the textual reformulation of the query
- At the center, Jarvis’ answer to the query
- Around the main bubble, various pods from Wolfram|Alpha
I like that
!
And I guess, that’ll be all for this week-end !
Published by: Fred on February 17th, 2013 | Filed under Art, Free Software, Raspberry Pi
Comment now »
Project “Jarvis”: step three (the brain)
| During the last steps, voice recording, speech-to-text and text-to-speech feasibility was studied. Now enters another difficicult part: the brain ! | ![]() |
The last steps implied the use of external services for voice-recognition and text-to-speech capabilities.
When it comes to Jarvis’ brain, the idea is twofold:
- Onboard answer engine: part of the analysis will be done onboard with simple regular expressions (as it was seen on the “proof of concept” step).
- External answer engine: the other part of the analysis is triggered whenever the first one fails. More than a fallback engine, this service should enrich the answer as much as possible.
Wolfram|Alpha is a computational knowledge engine. It’s an online service that answers factual queries directly by computing the answer in terms of structured data, rather than providing a list of documents or web pages that might contain the answer as a search engine would do. It’s a pretty good candidate for Jarvis.
Jarvis’ Workflow
This simplified illustration describes Jarvis’ workflow, including voice acquisition, external speech-to-text, parsing & analyzing (including Wolfram|Alpha service), external text-to-speech and actions:

Wolfram|Alpha
The Wolfram|Alpha API provides a web-based API which allows clients to submit free-form queries. The API is implemented in a standard REST protocol using HTTP GET requests. Each result is returned as a descriptive XML structure wrapping the requested content format.
Roughtly speaking, these results are divided into sections: assumptions and pods.
The
The
The web-service API can be accessed from any language that supports web requests and XML. Furthermore, the Wolfram|Alpha community provides a set of bindings for Ruby, Perl, Python, PHP, .Net (juk!) Java (and Mathematica, of course).
I created an account on Wolfram|Alpha and applied for an AppID (which I received right away). I free account allows up to 2000 queries a month, which should plenty.
I downloaded the PHP binding to my Raspberry Pi, read the documentation (no, I didn’t, I’m kidding. Who reads manuals ???), and in less than 100 lines of code I got this running:

It’s a first try, but it looks promising. Jarvis may have soon a brain.
Published by: Fred on February 17th, 2013 | Filed under Free Software, Raspberry Pi
Comment now »
Project “Jarvis”: step two (speak to me)
| In my previous post, I conducted a few experiments with speech recognition via Google’s Speech API and get enough results to push the project “Jarvis” a bit further. Now it is time for Jarvis to speak ! |
![]() |
Text-To-Speech engines
There are many “Text-To-Speech” engines already packaged for the Rasberry Pi. Namely:
- espeak: eSpeak is compact Open Source speech synthetizer (for English and other languages). It is available as a shared libray and as a command line program to speak from a file or from
stdin. It can be used as a front-end to mbrola diphone voices. - festival: Festival Speech Synthesis System is a multi-lingual Open Source speech synthetizer which offers Text-To-Speech capabilities with various API.
- flite: festival-lite is a small run-time speech synthesis engine developed at Carnegie Mellon University, derived from Festival.
Let’s install and try these three engines:
1 2 3 | apt-get install espeak apt-get install festival apt-get install flite |
Unfortunatley, I ran into a set of broken packages when I tried to install mbrola voices for espeak and festival:
root@applepie ~ # apt-get install mbrola-en1 mbrola-fr1 mbrola-fr4 mbrola-us1 mbrola-us2 mbrola-us3 festvox-en1 festvox-us1 festvox-us2 festvox-us3 Reading package lists... Done Building dependency tree Reading state information... Done Some packages could not be installed. This may mean that you have requested an impossible situation or if you are using the unstable distribution that some required packages have not yet been created or been moved out of Incoming. The following information may help to resolve the situation: The following packages have unmet dependencies: mbrola-en1 : Depends: mbrola but it is not installable mbrola-fr1 : Depends: mbrola but it is not installable mbrola-fr4 : Depends: mbrola but it is not installable mbrola-us1 : Depends: mbrola but it is not installable mbrola-us2 : Depends: mbrola but it is not installable mbrola-us3 : Depends: mbrola but it is not installable E: Unable to correct problems, you have held broken packages. |
It meant that the outputs from espeak and festival would quite probably be rather poor in quality. Thus, I introduced a new contender as an external service: Google Text-to-Speech API.
Here’s a little benchmark, where the speech outputs from each engine are compared, given the same quote from 2001 Space Odyssey.
Benchmark #1: espeak
Getting a .wav file from plain text is quite easy:
1 | espeak "Look Dave, I can see you're really upset about this" --stdout > espeak.wav |
Here’s the .wav output from espeak:
![]() |
espeak
|
As expected, it is really bad. It reminds me of the speech synthetizer I used to play with on my Atari 1040STF in the 80′s
Benchmark #2: festival
Getting a .wav file from plain text is also easy:
1 | echo "Look Dave, I can see you're really upset about this" | text2wave -o festival.wav |
And the resulting .wav output is:
![]() |
festival
|
Less robotic, but still very far from what I need for Jarvis
Benchmark #3: flite
Getting a speech output form flite is as simple as it is form espeak and festival:
1 | echo "Look Dave, I can see you're really upset about this" | flite -o flite.wav |
And the resulting .wav goes like this:
![]() |
flite
|
Better. It’s getting HAL-like, but I really need something closer to a real human voice.
Benchmark #4: Google TTS
Google Text-To-Speech is a private REST API. Getting results is less straightforward but noneless very easily manageable. Here’s a little PHP script:
1 2 3 4 5 | <?php $voice = urlencode("Look Dave, I can see you're really upset about this"); $cmd ='/usr/bin/curl -A "Mozilla" "http://translate.google.com/translate_tts?tl=en_gb&ie="UTF-8"&q='.$voice.'" > google.mp3'; shell_exec($cmd); ?> |
And here’s the result (converted to the same .wav format):
![]() |
Google (en_gb)
|
Much much better
. Maybe a little too slow. Let’s try to play with localizations and switch from British English to US English:
![]() |
Google (en_us)
|
Surprisingly, the US voice is female ![]()
Not bad. Now, let’s try a French version:
![]() |
Google (fr_fr)
|
Really good. Also a female voice. It is actually very close to the synthetic voice used at SNCF (French Railroads) stations. Kind of a scary voice. It feels like … I’m gonna miss a f**king train.
I think I’m gonna settle for the Bristish voice from Google’s Text-To-Speech Engine.
I’ll have to rely (once more) on an external service, but a electronic butler has to be British
Published by: Fred on February 16th, 2013 | Filed under Free Software, Raspberry Pi
2 Comments »
Project “Jarvis”: step one (proof of concept)
| Adding Siri to both my old iPad 1 and iPhone 4 was a failure Jailbreaking went nice, but messing up with SiriPort was a complete disaster, and it took me nearly 2 hours to turn back these devices into something different than a brick. |
![]() |
And thus … no SiriProxy for me. But then again, why should I mess with existing closed-source crap, when I can build my own stuff ? Hum ?
Project “Jarvis”
Here comes Project “Jarvis“. Ok, the name sucks… I shouldn’t watch these Marvel movies. And the logo is no more than a copy of Siri’s own logo, with a touch of Raspberry color. I’ll work on these later: now, it is time to proof check the ideas behind this project.
The principles are quite simple:

- 1 – A mobile App is used to record a simple question and send it to the Raspberry Pi
- 2 – The Raspberry Pi transforms the recorded voice into something understandable by Google’s Speech API and push the result to it
- 3 – Google Speech API returns back its voice-to-text interpretation as a JSON data structure
- 4 – The Raspberry Pi parses the data, builds something out of it and sends back its answer to the mobile App (and eventually to a Home Automation system)
- 5 – The mobile app prints out the answer to the question.
- 6 – Applauses and tears of joy
Proof of concept
First, let’s record a simple question. “Quelle heure est-il ?” (What time is it ?) will be a good start:

Then, let’s send it to the Rapberry Pi:
1 | scp heure.caf root@applepie:/opt/jarvis |
In order to get it interpreted by Google’s Speech API, one as to convert the record from Apple’s CAF (Core Audio Format) to the much more standard FLAC format:
1 2 | apt-get install ffmpeg ffmpeg -i heure.caf heure.flac |
Let’s send it to Google Speech API:
1 2 3 | curl -i -X POST -H "Content-Type:audio/x-flac; rate=44100" -T heure.flac "https: //www.google.com/speech-api/v1/recognize?xjerr=1&client=chromium&lang=fr-FR&maxr esults=10&pfilter=0" |
After a 1 or 2 seconds, I got the answer from Google:
HTTP/1.1 200 OK Content-Type: application/json; charset=utf-8 Content-Disposition: attachment Date: Sun, 10 Feb 2013 22:50:42 GMT Expires: Sun, 10 Feb 2013 22:50:42 GMT Cache-Control: private, max-age=0 X-Content-Type-Options: nosniff X-Frame-Options: SAMEORIGIN X-XSS-Protection: 1; mode=block Server: GSE Transfer-Encoding: chunked {"status":0,"id":"f75093db420033490c2424cdb58de963-1","hypotheses":[{"utterance":"quel heure est il","confidence":0.61982137},{"utterance":"quelle heure est il"},{"utterance":"quel temps fait il"},{"utterance":"quelle heure est-il"},{"utterance":"quel temps va til"}]} |
Not bad ![]()
Polishing up
First, let’s write a few lines of PHP on the Rasperry Pi (see previous post for the details of the Nginx/PHP installation):
- to trigger the ffmpeg conversion
- to sent the converted FLAC record to Google’s speech-to-text engine
- to get the JSON data structure back
- to parse the XML result (a few regexps would do)
- to send back a well thought answer to the question
Then, let’s fire up XCode, and with the help of the Core Audio API documentation, let’s write down a few lines of Objective-C:

Pretty cool for a 2-hours work
Now what ?
I guess the proof of concept is conclusive
Now, the trick is that is not exactly fast. Almost … as slow as Siri.
The exchange with Google is the bottleneck. Also, I’d rather not depend on a private external API. I guess, one of the next step will be to see how would PocketSphinx fit into this project.
The CAF-to-FLAC convertion could also be done on the iOS side of the project. I’ll check out this project later: https://github.com/jhurt/FLACiOS.
Also, Jarvis is litterally speechless. Adding a few “text-to-wav” functionalities shouldn’t be too hard since espeak or festival are already packaged by Raspbian.
Then, of course, I’ll have to put a bit of thought into Jarvis’s brain (text analyzer) and hook the Raspberry Pi to some kind of Home Automation system.
And the iOS part needs a lot of looooove.
But I guess, that’s enough for a first step.
Published by: Fred on February 11th, 2013 | Filed under Free Software, Raspberry Pi
1 Comment »
Raspberry Pi, ready to serve !
| Alright. My Raspberry Pi by is set up and delivering AirPrint and AirPlay services. Let’s add few web capabilities. |
![]() |
A lightweight configuration: Nginx & SQLite
I felt like the usual Apache / MySQL duet might be a little to heavy for my tiny ApplePie. So, I opted for a Nginx / SQLite couple.
Everything is already packaged, thus, the installation is rather straighforward:
1 2 3 4 5 6 7 8 9 10 | apt-get install nginx apt-get install php5-fpm php5-cgi php5-cli php5-common apt-get install php-pear php5-gd php5-imagick php5-mcrypt php5-memcache php5-sqlite apt-get install sqlite3 useradd www-data groupadd www-data usermod -g www-data www-data mkdir /var/www chmod 775 /var/www -R chown www-data:www-data /var/www |
Its time to modify php.ini :
cgi.fix_pathinfo = 0; |
… and to configure Nginx default site (/etc/nginx/sites-enables/default):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 | upstream php { server unix:/var/run/php5-fpm.sock; } server { root /var/www; listen 80; server_name applepie; access_log /var/log/nginx/access.log; error_log /var/log/nginx/error.log; index index.html index.php; location = /favicon.ico { log_not_found off; access_log off; } location / { try_files $uri $uri/ /index.php; } location ~ \.php$ { include fastcgi_params; fastcgi_intercept_errors on; fastcgi_pass php; } location ~* \.(js|css|png|jpg|jpeg|gif|ico)$ { expires max; log_not_found off; } } |
Let’s fire up the Nginx:
service nginx start |
… and check that everything is working nicely:

The Raspberry Pi is now ready to serve
I guess, I have no reason left not to finish my cooking iOS app know …
Published by: Fred on February 3rd, 2013 | Filed under Free Software, Raspberry Pi
Comment now »
»




![[del.icio.us]](http://quantum-bits.org/wp-content/plugins/bookmarkify/delicious.png)
![[Digg]](http://quantum-bits.org/wp-content/plugins/bookmarkify/digg.png)
![[Facebook]](http://quantum-bits.org/wp-content/plugins/bookmarkify/facebook.png)
![[Google]](http://quantum-bits.org/wp-content/plugins/bookmarkify/google.png)
![[Technorati]](http://quantum-bits.org/wp-content/plugins/bookmarkify/technorati.png)









