Scheduling ATOM Publication Parsing: Spain Public Sector Contracts with PythonAnywhere

H-Barrio
Nov 2, 2021
2 min read

We have a method to parse ATOM publication files, and we can run it indiscriminately although manually. The benefit from automating our market intelligence gathering operations is only finalized when the information comes to us without requesting it specifically. We will schedule the publication reading in this post so that our ATOM parser runs periodically during a given day and generates the publication records automatically. In a future post, we will distribute the results of the new publication to interested parties.

There are multiple task scheduling services in the market. Google offers Google Cloud, which we could use to run our existing Jupyter Notebooks. However, this tool is possibly too much power for such a simple task. For a more focused solution, we can use https://www.pythonanywhere.com/ to schedule our tasks using our language of reference. Mind you that running the files in this publication accesses a non-whitelisted URL; its access is restricted to non-free accounts. You can create an account and test it with the lowest price premium account; this price is significantly lower than the minimum Google Cloud or Amazon Web Services monthly subscription.

We have to make some minor modifications to our existing parser to run smoothly at pythonanywhere. Python anywhere is running, by default, Python 3.9. The XML parser node.getchildren function has to be replaced with list(node) to keep working with this newest version. Also, our script will not necessarily run with the file location as the working directory. We have to make the script aware of its location to store the records in a fixed relative location. The file is at this Github link here. Changes are these:

The rest of the print statements, occasional dataframe head display, and similar notebook-related commands can stay; they are utterly unnecessary as we will not inspect the console while the script runs under a schedule.

Once the file is ready, upload it into the pythonanywhere file system, under "files". You can first create a separate directory to hold your script:

Clicking on the file brings us to the IDE, where we can run the script and check that there are no errors and that the records folder and file are created correctly for a single run:

The script seems to be performing the task. Now we can schedule it to run several times in a single day from the "Tasks" pane:

The publication parsers will run three times each day at the predefined hours. With this automation in place, we do not have to worry about missing any publication database changes. The publication history will be stored in csv files for later use. In our next post, we will introduce into this script a reporting functionality to send via email the results of our search.

If you require data model development, deployment, verification, or validation, do not hesitate and contact us. We will also be glad to help you with your machine learning or artificial intelligence challenges when applied to asset management, trading, or risk evaluations. Ostirion does not endorse any of the products mentioned in this post and is in no way affiliated with the mentioned companies, nor does it receive any compensation for this publication.

OSTIRION

Scheduling ATOM Publication Parsing: Spain Public Sector Contracts with PythonAnywhere

Recent Posts

Comments