This tutorial is a basic introduction to Python for anyone in the heritage community. It is an updated version of the Introduction to Coding appendix in my book Open Heritage Data.
What is Python?:
Python is a general-purpose open source programming language released in 1991 (python.org). It is often described as fast, friendly and easy to learn. While it can be used for a variety of applications, websites and games it is also popular for use in data science because of its data manipulation tools and the relatively gentle learning curve. Python contains a variety of different modules and packages, providing many specialised and ready to use functions. This means that you can do a lot of data manipulation with few lines of code.
Why do I need Python?:
I introduced PHP as a programming language suited to publishing whole heritage datasets in an online environment. However, Python is a more suitable choice if you wish to analyse a dataset for research purposes or in order to visualise a dataset for publication. Because of the extensive code libraries available you can go from basic Python knowledge to doing more advanced data manipulations in a few steps. Thus you don’t need a computer science degree in order to use Python for your research.
For an example of this see the tutorial using a dataset of 19th century dogs which I created for my talk at the University of Edinburgh Centre for Data, Culture & Society: Skills in Heritage Data Science: Meet the Dogs of 19th Century Denmark.
Example of Python code:
# first we define two variables, calculate a third and print the result (11)
x = 3
y = 8
z = x + y
print(z)
# print the data type of the x variable, result is int
print(type(x))
# create a list of artists
artist = ['Anna', 'Marie', 'Anne']
# make a for loop and print each of the values in the list
for x in artist:
print(x)
Getting started with Python
All code needs to “run” in an environment that understands the language. Take HTML as an example – if you open HTML code in MS Word, you will probably just see the code as plain text. But if you open the code in a browser, it will render your code into website elements. The same goes for Python – it needs to run in an environment/program that understands the Python commands.
There are many options for this. You can install Python with an environment on your own computer – but I would suggest that you save this option for later. There are also online Python environments/editors/compilers/etc. available. Here I will use the Google Colab environment as it is a part of the Google Drive setup which many are already familiar with.
Use this link: https://colab.research.google.com and depending on whether you are already logged into a Google/Gmail account or not (you will need to log in first) you will see some sort of welcome page. You can start by making a NEW NOTEBOOK. A notebook in Python is a file format (.ipynb at the end) which consists of cells of either text or code.
You can have a look at my beginners workbook here!
Python absolute beginners cheat sheet:
[This is still a work in progress]This one is for my 2020 students who felt that all online cheat sheets for Python beginners were anything but – here is an absolute beginners cheat sheet.
Check out the accompanying beginners workbook to see it in action.
Variable – containers for data (text with quotation marks: x = “hello”, numbers without: y = 5)
Most used functions:
print( ) – outputs the content inside