top of page

Slightly Faster Re-runs of Python Notebooks by Smart Installs

Google Colab, in particular, and Jupyter Notebook tools, in general, are excellent tools for coded prototyping and quick testing. They are also becoming the de-facto deployment mode for many machine learning and artificial intelligence models, even if this is not their initially intended use.


One of the slightly frustrating parts of using a Google Colab notebook is the inefficient and time-consuming installation of Python packages every time you run or re-run the notebook; pip will try to check by installation if everything you need is there and generate the uninformative, for this specific application, installation logs.


So our proposal for a slight improvement is as follows, imagine that we need to investigate web parsing, and we will be using the following packages:

import requests
from urllib.parse import urlparse, urljoin
from bs4 import BeautifulSoup
from requests_html import HTMLSession
import nest_asyncio

We may not know exactly which of these are installed initially in our system; in general, we will receive an error, proceed to install the module in question with whatever management tool we prefer, and continue. Of course, this will happen every time we share the notebook or run it ourselves.


Instead of starting with our regular imports, we will begin by importing subprocess, sys, and pkg util iter_modules. Then we will pass our required modules inside a list using the string name representation of the package, which could also be obtained from a requirements.txt file. We are doing this without regard for specific versions this time, as it can be added later if needed.

import subprocess
import sys
from pkgutil import iter_modules

reqs = ['nest_asyncio',
        'requests_html',
        'bs4',
        'urllib',
        'requests']

We will define a function to install a target package by executing pip in the system through sys:

def install(package):
    subprocess.check_call([sys.executable, "-m", "pip", "install", package])

And check what packages are already installed in our system through iter_modules:


installed_packages_list = [p.name for p in iter_modules()]

Now, we can iterate over our required packages and install them if it is not installed in our current notebook session, saving possibly some minutes per run:

for r in reqs:
  if r in installed_packages_list:
    print(f"{r} already installed, skipping install.")
    continue
  print(f"Installing {r}, please wait...")
  install(r)

You can try this in this Colab Notebook.


Do not hesitate to contact us if you require quantitative model development, deployment, verification, or validation. We will also be glad to help you with your machine learning or artificial intelligence challenges when applied to asset management, automation, or intelligence gathering from satellite, drone, or fixed-point imagery. Also, check our AI-Powered Spanishpublic tender search application using sentence similarity analysis to provide better tender matches to selling companies.



Faster PIP install AI generated art.









10 views0 comments

Comments


bottom of page