avatar

Basic Approach : Downloading the whole file in memory

The most basic implementation you could think of to read and use a distant content is to download the whole file in memory. You can do this like that in Go, it's pretty easy.

package main

import (
    "ioutil"
    "log"
    "net/http"
)

func main() {
    resp, err := http.Get("http://mylargefile.com/thefile.txt")
    if err != nil {
        log.Fatalf("Error while getting the url : %v", err)
    }
    defer resp.Body.Close()
    content, err := ioutil.ReadAll(resp.Body)
    if err != nil {
        log.Fatalf("Error while reading the body : %v", err)
    }
    // Do things with the content
}

Pretty easy. Now this approach will work in most of the case. There are a few drawbacks though, as for example, performance. You have to actually wait for the whole file to be downloaded before doing anything with its content. And what happens if the file is really large ? Like, bigger-than-your-total-ram large ? The solution is simple : Read the distant file's content like a stream using a scanner.

Stream Approach

package main

import (
    "bufio"
    "log"
    "net/http"
)

func main() {
    resp, err := http.Get("http://mylargefile.com/thebiggestfileever.txt")
    if err != nil {
        log.Fatalf("Error while getting the url : %v", err)
    }
    defer resp.Body.Close()
    scanner := bufio.NewScanner(resp.Body)
    for scanner.Scan() {
        // Do something with each line
    }
}

Now as you can see, the file will be treated line by line by the scanner while the file is being downloaded, thus saving memory and improving performances. It's not even more complex to use than the basic approach and it will save some execution time.

So I had this little challenge. Write a one-line loto simulation. Actually it looked pretty interesting at first and them someone pointed me out that there is a random.sample function. Anyway, here are some pretty solutions that have been sent by some members of the #python chan on freenode. First solution I worked on using lambdas :

import random
lst = list(range(0, 100))
[print(y) for x, y in [(lambda x: [lst.remove(x), x])(random.choice(lst)) for x in range(0,10)]]

Second solution when using random.sample as suggested by TFKyle

import random
[print(x) for x in random.sample(range(100), 10)]

|Zz|'s first solution

import random
lst = [(lambda x,y: print(y) or x)(lst,y) for x, y, lst in [(lambda x: (lst.remove(x), x, lst))(random.choice(lst)) for x in range(0,10) for lst in [list(range(100))]]][-1]

|Zz|'s second solution

import random
lst = [(lambda lst,x: print(x) or lst)(lst,x) for _, x, lst in [(lambda x: (lst.remove(x), x, lst))(random.choice(lst)) for i in range(0,10) for lst in [list(range(100))]]][-1]

(That guy is crazy o.o)

tech2's solution

import random
print('\n'.join(map(str, random.sample(range(100), 10))))

I mainly worked on that to learn more about the uses of lambda because it has always been a bit obscure to me. My primary goal (what |Zz| fixed) was that I didn't know how to define the list (range(100)) inside the lambda.

Tutorial on PocketSphinx with Python 3.4

Disclaimer

This code is actually getting kind of old. I suspect some of the code samples won't work as expected. Please let me know in the comments if so ☺

Introduction

CMUSphinx is a great project that I wanted to try for a really long time. I'm getting tired of speech recognition libraries that just make calls to the Google API. That requires a constant use of an internet connection. We will use mainly two components of the CMUSphinx project : sphinxbase and pocketsphinx.

This tutorial will cover the following points :

  • Installation
    • Sphinxbase
    • Pocketsphinx
  • Installing a support for another language (french)
  • Python

We will work in a test directory, so just a create a new one and go in it.

$ mkdir sphinx_test
$ cd sphinx_test

Installation Part


Sphinxbase Installation

$ git clone https://github.com/cmusphinx/sphinxbase
$ cd sphinxbase
$ ./autogen.sh --prefix=/usr

Note that I set the prefix to /usr so that the library is installed in the same place all the other libs are. Otherwise it will install it in /usr/local/lib/python3.x/site-packages (where x is your python version) which isn't a problem if you set your PYTHONPATH environment variable accordingly.

"But but, why do you stop here ? Can't we just make and make install ?"

Answer : You can. Long answer : You can. But you'll soon notice that there are errors in the makefile concerning the documentation. Actually sphinx will be useable but your doc would be unuseable. I don't know about you, but I prefer when my system has all the docs I need to develop. Problem comes from dox2swig.py which needs to be executed using a 2.x version of Python. So we will just open the doc/Makefile file and replace the line that says PYTHON=/usr/bin/python to PYTHON=/usr/bin/python2. Now go on, you can run your make and make install. And no, it won't affect anything else than the document generation.

$ make
$ sudo make install

Pocketsphinx Installation

Hey remember what you just did in the above section ? Well just do the same for pocketsphinx. (Exactly the same, like... Really.)

$ git clone https://github.com/cmusphinx/pocketsphinx
$ cd pocketsphinx
$ ./autogen.sh --prefix=/usr

Modify doc/Makefile

$ make
$ sudo make install

French

You'll need to download the following files : french3g62K.lm.dmp frenchWords62K.dic lium_french_f0.tar.gz If you installed sphinxbase and pocketsphinx correctly, then you'll have a folder in /usr/share/pocketsphinx/model/ which already contains an english model.

$ cd /usr/share/pocketsphinx/model/
$ sudo mkdir fr_FR; cd fr_FR
$ sudo mv ~/Downloads/{french3g62K.lm.dmp,frenchWords62K.dic,lium_french_f0.tar.gz} .
$ sudo mkdir fr_FR
$ sudo tar xvf lium_french_f0.tar.gz fr_FR/

So you'll have the following files now :

├── french3g62K.lm.dmp
├── frenchWords62K.dic
└── fr_FR
    ├── feat.params
    ├── LICENSE
    ├── mdef
    ├── means
    ├── mixture_weights
    ├── noisedict
    ├── README
    ├── transition_matrices
    └── variances

1 directory, 12 files

Now your pocketsphinx installation supports the french language. Note that I put those files in that directory only for conveniance, so that all my models are in the same place. But you can totally put them wherever you want.


Python

Here is a script that I took and modified from Mattz69's blog. Here is the original article.

#!/usr/bin/env python

import os
import sys
from ctypes import *
from contextlib import contextmanager

import pyaudio
from pocketsphinx.pocketsphinx import *
from sphinxbase.sphinxbase import *

script_dir = os.path.dirname(os.path.realpath(__file__))
model_dir = "/usr/share/pocketsphinx/model/fr_FR/"

hmm = os.path.join(model_dir, "fr_FR")
lm = os.path.join(model_dir, "french3g62K.lm.dmp")
dic = os.path.join(model_dir, "frenchWords62K.dic")

sys.stderr = open(os.path.join(script_dir, "stderr.log"), "a")

ERROR_HANDLER_FUNC = CFUNCTYPE(None, c_char_p, c_int, c_char_p, c_int, c_char_p)


def py_error_handler(filename, line, function, err, fmt):
    pass
c_error_handler = ERROR_HANDLER_FUNC(py_error_handler)


@contextmanager
def noalsaerr():
    asound = cdll.LoadLibrary('libasound.so')
    asound.snd_lib_error_set_handler(c_error_handler)
    yield
    asound.snd_lib_error_set_handler(None)

config = Decoder.default_config()
config.set_string('-hmm', hmm)
config.set_string('-lm', lm)
config.set_string('-dict', dic)
config.set_string('-logfn', '/dev/null')
decoder = Decoder(config)

with noalsaerr():
    p = pyaudio.PyAudio()
stream = p.open(format=pyaudio.paInt16, channels=1, rate=16000, input=True, frames_per_buffer=1024)
stream.start_stream()
in_speech_bf = True
decoder.start_utt()
while True:
    buf = stream.read(1024)
    if buf:
        decoder.process_raw(buf, False, False)
        try:
            if decoder.hyp().hypstr != '':
                print('Partial decoding result:', decoder.hyp().hypstr)
        except AttributeError:
            pass
        if decoder.get_in_speech():
            sys.stdout.write('.')
            sys.stdout.flush()
        if decoder.get_in_speech() != in_speech_bf:
            in_speech_bf = decoder.get_in_speech()
            if not in_speech_bf:
                decoder.end_utt()
                try:
                    if decoder.hyp().hypstr != '':
                        print('Stream decoding result:', decoder.hyp().hypstr)
                except AttributeError:
                    pass
                decoder.start_utt()
    else:
        break
decoder.end_utt()
print('An Error occured:', decoder.hyp().hypstr)