# Majordom ## General idea - The stt "pocketsphinx_continous" continousely records the microphone and search for sentences matching in a given language - Then it interprets utterances according to "natural command" concepts (see ./testing_natural_command/*) - Commands (more or less hardcoded) are sent to the xAAL bus - The result of is pronouced by the "espeak" tts (just for fun) ## Notes - The code here is based on pocketsphinx_continous relase "5prealpha" URL: svn://svn.code.sf.net/p/cmusphinx/code/trunk/pocketsphinx/src/programs/continuous.c Last Changed Rev: 13156 Last Changed Date: 2016-01-06 02:38:35 +0100 (Wed, 06 Jan 2016) You can use continuous.patch to rebuild the file continuous.c used here. - However this may works with last stable release (libpocketsphinx-dev 0.8-x) ## Results - Tests are performed on french language https://sourceforge.net/projects/cmusphinx/files/Acoustic%20and%20Language%20Models/French/ - Acoustic model is cmusphinx-fr-ptm-8khz-5.2.tar.gz Reference french dictionnary is fr.dict Download and install this somwere on your system (e.g., /usr/local/share/pocketsphinx/model/fr/) - The proposed lm-grammar and dictionnary are build by scripts proposed in ./tools/ - Sentences to pronouce: ./tools/corpus.txt ## Documentation https://cmusphinx.github.io/wiki/tutorial/ http://cmusphinx.sourceforge.net/wiki/tutorialpocketsphinx http://cmusphinx.sourceforge.net/wiki/faq#qcan_pocketsphinx_reject_out-of-grammar_words_and_noises ## Build - Get latest cmu-sphinx release, compile it, install it: svn checkout svn://svn.code.sf.net/p/cmusphinx/code/trunk cmusphinx-code cd cmusphinx-code for i in sphinxbase pocketsphinx cmuclmtk ; do cd $i ./autogen.sh && ./configure && make sudo make install cd .. done Note that sphinxbase detects itself at configure time if your system uses PulseAudio, else fallback to ALSA, else fallback to OSS. Note that there is no option to force the choice. :-( This depends on the presence of /usr/include/pulse/pulseaudio.h and /usr/include/alsa/asoundlib.h on your system. (libpulse-dev libasound2-dev package) You may have to trick /usr/local/src/cmusphinx-code/sphinxbase/configure.ac to force one or the other by hand. (E.g., change the line 123: ... AC_CHECK_HEADER(pulse/pulseaudio.h ...) The JACK alternative is no more maintained since 2014. :-( As a result this produces one libsphinxad library binded to the selected audio solution. If you choose the PulsAudio solution, you may encouter errors such as "Error opening audio device (null) for capture: Connection refused" This means that the libsphinxad library is unable to reach a pulseaudio server (either the user session pa-server or the system one), or that it is unable to select the default sink/source pulse device, etc. In that case you may have to use pax11publish -d, pacmd and pactl tools, set PULSE_SERVER and PULSE_COOKIE variables, edit /etc/pulse/default.pa, check ~/.config/pulse/ or /run/user/1000/pulse/native, and pray! See: https://wiki.archlinux.org/index.php/PulseAudio/Configuration https://wiki.archlinux.org/index.php/PulseAudio/Examples#Allowing_multiple_users_to_use_PulseAudio_at_the_same_time - Dependencies: . libsphinxbase libpocketsphinx (see above) (or libsphinxbase-dev libbocketsphinx-dev packages if available and version match) . libxaal + packages uuid-dev libjson-c-dev libsodium-dev . package libfstrcmp-dev . packages libespeak-dev mbrola mbrola-fr1 (just for fun) ## Discussions - About sphinx . The classical lm-grammar search is fine, however one may get false-positive. The "keyword spotting search" under development seems very promising. - About natural command . The algorithm is more or less a kind of maximum-likelihood estimation over a list of keywords associated to expected commands. No grammar consideration. . Pro: the algorithm is naive but verry simple ;-) . Cons: the associated xAAL commands are more or less hardcoded. ## How to customize the Majordom - The 'Speech To Text' part . Choose an accoustic model (from audio to phonems) . Choose a dictionnary (from phonems to words) . Choose a lm-grammar (expected sentences in terms of probability to get singles words, sequences of two words, sequences of three words) . Look at scripts in ./tools/ . Look at CMU Sphinx for details - The 'Natural Command' part, i.e. from the world of sentences in natural language to the world of xAAL commands . Edit 'translate.dic': for each target words of the xAAL world (dev_types, methods to call, tags of devices), this gives corresponding naming in your natural languages (the sources words) . If needed, edit 'cmd_lib.c' (and recompile): here are code of xAAL actions, and their signature in terms of target words recognized by previous steps - The 'Text To Speech' part . The Majordom select French or English output according to your locale, so setup your LC_LANG shell variable before . Parameters of espeek+mbrolla engines are hardcoded in tts.c, so edit it and recompile if needed . Sentences to pronounce and translated ones (in french) are in .mo files. So customize .po files and rebuild .mo files if needed ## Phone to your Majorom The idea is to connect the majordom to an asterisk ipbx via the alsa audio loopback driver. - Load alsa drivers at boot time. Edit /etc/modules and add snd-aloop - Let the loopback driver be the default audio device. Create a file named /etc/modprobe.d/sound.conf and add alias snd-card-0 snd-aloop - Configure asterisk to load its alsa channel plugin. Edit /etc/asterisk/modules.conf with load => chan_alsa.so noload => chan_oss.so - Configure the alsa plugin for asterisk. Edit /etc/asterisk/alsa.conf [general] autoanswer=yes context=default extension=s input_device=plughw:0,1 output_device=plughw:0,1 - Edit /etc/asterisk/extensions.conf [default] ../.. include => majordome ../.. [majordome] exten => 6000,1,Answer exten => 6000,2,Ringing exten => 6000,3,Wait(2) exten => 6000,4,eSpeak("Bonjour. Je suis le majordome de Experiment'HAAL. Comment puis-je vous aider ?") exten => 6000,5,Dial(console/alsa) exten => 6000,6,Hangup Then, call 6000!