H-Barrio
- Mar 18, 2022
- 3 min read

The Machine, Dreaming - Part II

Dreams are, possibly, part image and part speech, as backed by recent research. If our machine is going to dream as we, humans, do, it will have to recognize both sounds and images. OSTIRION has extensive experience in image recognition, not that much in sound, speech, recognition, so that we will experiment with a simple demonstration. Our dreaming machine will need to listen to the world, humans, or other dreaming machines to incept its dreams. Using a Colab notebook as an example, the process to turn speech into text follows.

We will new two modules outside Python defaults present in Colab, SpeechRegonition of obtaining the text from speech and FFmpeg to process our sound inputs and a series of imports from default modules:

!pip install SpeechRecognition
!pip install ffmpeg-python

import ffmpeg
import speech_recognition
import requests
import urllib.parse
import google.colab
import os
import io
import time
from base64 import b64decode
from google.colab.output import eval_js
from scipy.io.wavfile import read as wav_read
from scipy.io.wavfile import write
from IPython.display import Javascript

The first task is finding the location of our current notebook to be able to store detected speech text properly. This code comes from our previous publication on finding the location of notebook files by file system walking:

google.colab.drive.mount('/content/drive')


def locate_nb(set_singular=True):
    found_files = []
    paths = ['/']
    nb_address = 'http://172.28.0.2:9000/api/sessions'
    response = requests.get(nb_address).json()
    name = urllib.parse.unquote(response[0]['name'])

    dir_candidates = []

    for path in paths:
        for dirpath, subdirs, files in os.walk(path):
            for file in files:
                if file == name:
                    found_files.append(os.path.join(dirpath, file))

    found_files = list(set(found_files))

    if len(found_files) == 1:
        nb_dir = os.path.dirname(found_files[0])
        dir_candidates.append(nb_dir)
        if set_singular:
            print('Singular location found, setting directory:')
            os.chdir(dir_candidates[0])
    elif not found_files:
        print('Notebook file name not found.')
    elif len(found_files) > 1:
        print('Multiple matches found, returning list of possible locations.')
        dir_candidates = [os.path.dirname(f) for f in found_files]

    return dir_candidates

locate_nb()
nb_dir = locate_nb()
print(os.getcwd())

We cannot access any microphone connected to the virtual machine running our Colab Notebook; we have to use javascript to access the microphone installed in our computer system:

RECORD = """
const sleep  = time => new Promise(resolve => setTimeout(resolve, time))
const b2text = blob => new Promise(resolve => {
  const reader = new FileReader()
  reader.onloadend = e => resolve(e.srcElement.result)
  reader.readAsDataURL(blob)
})
var record = time => new Promise(async resolve => {
  stream = await navigator.mediaDevices.getUserMedia({ audio: true })
  recorder = new MediaRecorder(stream)
  chunks = []
  recorder.ondataavailable = e => chunks.push(e.data)
  recorder.start()
  await sleep(time)
  recorder.onstop = async ()=>{
    blob = new Blob(chunks)
    text = await b2text(blob)
    resolve(text)
  }
  recorder.stop()
})
"""

def record(sec=3):
  print("Speak Now...")
  display(Javascript(RECORD))
  sec += 1
  s = eval_js('record(%d)' % (sec*1000))
  print("Done Recording !")
  binary = b64decode(s.split(',')[1])
  
  process = (ffmpeg
    .input('pipe:0')
    .output('pipe:1', format='wav')
    .run_async(pipe_stdin=True, pipe_stdout=True, pipe_stderr=True, quiet=True, overwrite_output=True)
  )
  output, err = process.communicate(input=binary)
  
  riff_chunk_size = len(output) - 8
  # Break up the chunk size into four bytes, held in b.
  q = riff_chunk_size
  b = []
  for i in range(4):
      q, r = divmod(q, 256)
      b.append(r)

  # Replace bytes 4:8 in proc.stdout with the actual size of the RIFF chunk.
  riff = output[:4] + bytes(b) + output[8:]

  sr, audio = wav_read(io.BytesIO(riff))

  return audio, sr

This microphone accessing code comes in part from this publication by https://ricardodeazambuja.com/. The RECORD constant contains the javascript code to access our microphone through the notebook instance browser. Make sure you allow your browser to access the microphone while running it and remember to revoke access after using it to maintain privacy. The record function will record microphone sounds in 3 seconds chunks until the speech recognition module recognizes the word " stop. " Detected words are added to a file by the following functions:

def create_text_records(text_to_record=False):
    dir = 'heard_texts'
    sess = time.strftime("%Y%m%d%H")
    file = f'heard_texts{sess}.txt'
    try:
      os.makedirs(dir)
    except:
      print('Text Records already exist.')

    full_path = os.path.join(dir, file)
    f = open(full_path, "a")
    f.write(text_to_record)
    f.write("\n")
    f.close()
    return

def words_to_text():
  text = 'Nothing heard'
  while 'stop' not in text:
    audio_array, sr = record(sec=3)
    byte_io = io.BytesIO(bytes())
    write(byte_io, sr, audio_array)
    result_bytes = byte_io.read()
    audio_data = speech_recognition.AudioData(result_bytes, sr, 2)
    try:
      r = speech_recognition.Recognizer()
      text = r.recognize_google(audio_data)
    except:
      text = 'inintelligible'
    create_text_records(text_to_record=text)
  return

The directory for storing our texts will be heard_texts, where every hour, a new file will be created containing recognized words. The underlying idea is to keep all words heard for the machine for, as an example, a full day. These words can later be processed, weighted, analyzed, and used to produce dreams.

Calling words_to_text function starts our word recognition recording until "stop" is recognized:

words_to_text()

Now open the files in "heard_texts" directory to check what was understood by the speech recognition module; we will use them in the future to prime our dreaming machine:

For a dreamscape like this one:

Do not hesitate to contact us if you require quantitative model development, deployment, verification, or validation. We will also be glad to help you with your machine learning or artificial intelligence challenges when applied to asset management, automation, intelligence gathering from satellite, drone, fixed-point imagery, even dreams.

The notebook with this demonstration is here.

The Machine, Dreaming - Part II

Recent Posts