The problem is the following: we want the machine to store a list of words it hears over the microphone during a given "listening session," not forever as Alexa, Cortana, or any other similar spyware does. This is complex to perform through the web browser; the simplest method to demonstrate this is to listen locally to our conversation, locally create the list of keywords heard, and send them over periodically to our other processors. FTP may be the simplest, most straightforward method to achieve this exchange.
This is, by no means, a clean method for achieving this task; it is quick and easy. But, first, we will need an FTP server, if possible free. The list on this page contains options for free servers: https://www.filestash.app/free-ftp.html.
Once we have our FTP server, we need to run the following Python script locally to have easy access to the microphones installed in our system. The script starts with the required imports:
import ftplib import speech_recognition as sr import multiprocessing as mp import io
Then, create a function to handle the connection to our FTP server that will return an object with our connection settings:
def ftp_server(): # Open ftp connection # Use https://filemanager.ai/new/# to inspect ftp_host = 'your server hostname here' ftp_user = 'your user name here' ftp_password = 'password for the user here' ftp = ftplib.FTP(host=ftp_host, user=ftp_user, passwd=ftp_password) return ftp
This function will write into our FTP server the words our microphone hears. The file with the words will be named word.txt and overwritten every time we run the script:
def write_to_ftp(ftp, text): ftp_dir = "heard_words_storage" ftp.cwd("/") try: ftpResponse = ftp.mkd(ftp_dir) except: # The exeption raises from multiple errors, this may be one: print('Can´t create directory, directory may exist.') ftp.cwd(ftp_dir) print(text) text = bytes(str(text), 'utf-8') bio = io.BytesIO(text) ftp.storbinary(f'STOR words.txt', bio) return
Our core function will listen to the words and try to recognize them using the default trial Google Audio API. The listening session will stop as soon as the word "stop" is recognized. At every listening cycle, the function calls our writing function and stores the set of heard words in a text file:
def listen_words(): heard_words_list =  session = ftp_server() r = sr.Recognizer() m = sr.Microphone(0) while True: with m as source: r.adjust_for_ambient_noise(source) audio = r.listen(m) try: words = r.recognize_google(audio) except: words = 'inintelligible' [heard_words_list.append(w) for w in words.split(" ")] print(words) print(heard_words_list) write_to_ftp(session, heard_words_list) if 'stop' in words: session.close() break return
As we will have to add more functionality to this model and we do not want our listening function to block execution, we execute this main function in a separate process:
if __name__ == '__main__': p = mp.Process(target=listen_words) p.start()
Executing this script should add a list of the words your local microphone is hearing to your FTP server. We will have to check how fast this listening and writing is when we check the capacity for a GPU-enabled server to access these words and drive them through other models. The script will not work from Google Colab, and it has no access to our microphone; it is shown in this notebook for simplicity and copy-pasting if required.
Do not hesitate to contact us if you require quantitative model development, deployment, verification, or validation. We will also be glad to help you with your machine learning or artificial intelligence challenges when applied to asset management, automation, intelligence gathering from satellites, drones, fixed-point imagery, and even dreams.