Python: First Steps with MongoDB
I’m busy working my way through Kyle Banker’s MongoDB in Action. Much of the example code in the book is given in Ruby. Despite the fact that I’d love to learn more about Ruby, for the moment it makes more sense for me to follow along with Python.
If you haven’t already installed MongoDB, now is the time to do it! On a Debian Linux system the installation is very simple.
Python Package Installation
Next install PyMongo, the Python driver for MongoDB.
Check that the install was successful.
Detailed documentation for PyMongo can be found here.
Creating a Client
To start interacting with the MongoDB server we need to instantiate a
This will connect to
localhost using the default port. Alternative values for host and port can be specified.
Connect to a Database
Next we connect to a particular database called
test. If the database does not yet exist then it will be created.
Create a Collection
A database will hold one or more collections of documents. We’ll create a
As mentioned in the documentation, MongoDB is lazy about the creation of databases and collections. Neither the database nor collection is actually created until data are written to them.
Working with Documents
As you would expect, MongoDB caters for the four basic CRUD operations.
Documents are represented as dictionaries in Python. We’ll create a couple of light user profiles.
We use the
insert_one() method to store each document in the collection.
Each document is allocated a unique identifier which can be accessed via the
Although these identifiers look pretty random, there is actually a wel defined structure. The first 8 characters (4 bytes) are a timestamp, followed by a 6 character machine identifier then a 4 character process identifier and finally a 6 character counter.
We can verify that the collection has been created.
There’s also an
insert_many() method which can be used to simultaneously insert multiple documents.
find_one() method can be used to search the collection. As its name implies it only returns a single document.
A more general query can be made using the
find() method which, rather than returning a document, returns a cursor which can be used to iterate over the results. With our minimal collection this doesn’t seem very useful, but a cursor really comes into its own with a massive collection.
A cursor is an iterable and can be used to neatly access the query results.
sort() can be applied to the results returned by
update() method is used to modify existing documents. A compound document is passed as the argument to
update(), the first part of which is used to match those documents to which the change is to be applied and the second part gives the details of the change.
The example above uses the
$set modifier. There are a number of other modifiers available like
By default the update is only applied to the first matching record. The change can be applied to all matching records by specifying
multi = True.
Deleting records happens via the
remove() method with an argument which specifies which records are to be deleted.
Well those are the basic operations. Nothing too scary. I’ll be back with the Python implementation of the Twitter archival sample application.