Introduction to MongoDB (Chapter 1)
MongoDB is a document-oriented database, not a relational one. The primary reason for moving away from the relational model is to make scaling out easier, but there are some other advantages as well. A document-oriented database replaces the concept of a “row” with a more flexible model, the “document.” By allowing embedded documents and arrays, the documentoriented approach makes it possible to represent complex hierarchical relationships with a single record. This fits naturally into the way developers in modern objectoriented languages think about their data. There are also no predefined schemas: a document’s keys and values are not of fixed types or sizes. Without a fixed schema, adding or removing fields as needed becomes easier. Generally, this makes development faster as developers can quickly iterate. It is also easier to experiment. Developers can try dozens of models for the data and then choose the best one to pursue. Here are some advantages of using the database:
it is easily scalable
supports tons of features apart from creating, updating, reading and deleting, it provides unique features like indexing, aggregation, special collection data types, file storage.
It is fast
Document
At the heart of MongoDB is the document: an ordered set of keys with associated values. The representation of a document varies by programming language, but most languages have a data structure that is a natural fit, such as a map, hash, or dictionary. A document is the basic unit of data for MongoDB and is roughly equivalent to a row in a relational database management system (but much more expressive). For example:
{"greeting" : "Hello, world!"}
This simple document contains a single key, "greeting", with a value of "Hello, world!". Most documents will be more complex than this simple one and often will contain multiple key/value pairs.
{"greeting" : "Hello, world!", "foo" : 3}
Naming a key
The keys in a document are strings. Any UTF-8 character is allowed in a key, with a few notable exceptions:
Keys must not contain the character \0 (the null character). This character is used to signify the end of a key.
The . and $ characters have some special properties and should be used only in certain circumstances, as described in later chapters. In general, they should be considered reserved, and drivers will complain if they are used inappropriately. MongoDB is type-sensitive and case-sensitive. For example, these documents are distinct:
{"foo" : 3}
{"foo" : "3"}
{"foo" : 3}
{"Foo" : 3}
A final important thing to note is that documents in MongoDB cannot contain duplicate keys.
Collections
A collection is a group of documents. If a document is the MongoDB analog of a row in a relational database, then a collection can be thought of as the analog to a table. Collections have dynamic schemas. This means that the documents within a single collection can have any number of different “shapes.”
Naming a collection
A collection is identified by its name. Collection names can be any UTF-8 string, with a few restrictions:
The empty string ("") is not a valid collection name.
Collection names may not contain the character \0 (the null character) because this delineates the end of a collection name.
You should not create any collections that start with system., a prefix reserved for internal collections. For example, the system.users collection contains the database’s users, and the system.namespaces collection contains information about all of the database’s collections.
User-created collections should not contain the reserved character $ in the name. The various drivers available for the database do support using $ in collection names because some system-generated collections contain it. You should not use $ in a name unless you are accessing one of these collections.
Databases
In addition to grouping documents by collection, MongoDB groups collections into databases. A single instance of MongoDB can host several databases, each grouping together zero or more collections. A database has its own permissions, and each database is stored in separate files on disk. A good rule of thumb is to store all data for a single application in the same database. Separate databases are useful when storing data for several application or users on the same MongoDB server.
Naming a Database
The empty string ("") is not a valid database name.
A database name cannot contain any of these characters: /, \, ., ", *, <, >, :, |, ?, $, (a single space), or \0 (the null character). Basically, stick with alphanumeric ASCII.
Database names are case-sensitive, even on non-case-sensitive filesystems. To keep things simple, try to just use lowercase characters.
Database names are limited to a maximum of 64 bytes.
Getting started with MongoDB
MongoDB is almost always run as a network server that clients can connect to and perform operations on. Download MongoDB and decompress it . To start the server, run the mongod executable:
mongod #starts the deamon version of mongodb
Running the Shell
To start the shell , run the executable:
mongo #displays the version of the mongo you have installed on your system
Basic Operations with the Shell
Create
The insert function adds a document to a collection. For example, suppose we want to store a blog post. First, we’ll create a local variable called post that is a JavaScript object representing our document. It will have the keys "title", "content", and "date" (the date that it was published).
post = {"title" : "My Blog Post",
... "content" : "Here's my blog post.",
... "date" : new Date()}
{
"title" : "My Blog Post",
"content" : "Here's my blog post.",
"date" : ISODate("2012-08-24T21:12:09.982Z") }
db.blog.insert(post)
The blog post has been saved to the database. We can see it by calling find on the collection:
db.blog.find()
{
"_id" : ObjectId("5037ee4a1084eb3ffeef7228"),
"title" : "My Blog Post",
"content" : "Here's my blog post.",
"date" : ISODate("2012-08-24T21:12:09.982Z") }
Read
find and findOne can be used to query a collection. If we just want to see one document from a collection, we can use findOne.
db.blog.findOne()
{
"_id" : ObjectId("5037ee4a1084eb3ffeef7228"),
"title" : "My Blog Post",
"content" : "Here's my blog post.",
"date" : ISODate("2012-08-24T21:12:09.982Z") }
find and findOne can also be passed criteria in the form of a query document. This will restrict the documents matched by the query. The shell will automatically display up to 20 documents matching a find, but more can be fetched.
Update
If we would like to modify our post, we can use update. update takes (at least) two parameters: the first is the criteria to find which document to update, and the second is the new document. Suppose we decide to enable comments on the blog post we created earlier.
post.comments = []
db.blog.update({title : "My Blog Post"}, post)
output
db.blog.find()
{
"_id" : ObjectId("5037ee4a1084eb3ffeef7228"),
"title" : "My Blog Post",
"content" : "Here's my blog post.",
"date" : ISODate("2012-08-24T21:12:09.982Z"),
"comments" : [ ] }
Delete
remove permanently deletes documents from the database. Called with no parameters, it removes all documents from a collection. It can also take a document specifying criteria for removal. For example, this would remove the post we just created:
db.blog.remove({title : "My Blog Post"})
Data Types
null Null can be used to represent both a null value and a nonexistent field:
{"x" : null}
boolean There is a boolean type, which can be used for the values true and false:
{"x" : true}
number The shell defaults to using 64-bit floating point numbers. Thus, these numbers look “normal” in the shell:
{"x" : 3.14}
or:
{"x" : 3}
For integers, use the NumberInt or NumberLong classes, which represent 4-byte or 8-byte signed integers, respectively.
{"x" : NumberInt("3")}
{"x" : NumberLong("3")}
string Any string of UTF-8 characters can be represented using the string type:
{"x" : "foobar"}
date Dates are stored as milliseconds since the epoch. The time zone is not stored:
{"x" : new Date()}
regular expression Queries can use regular expressions using JavaScript’s regular expression syntax:
{"x" : /foobar/i}
array Sets or lists of values can be represented as arrays:
{"x" : ["a", "b", "c"]}
embedded document Documents can contain entire documents embedded as values in a parent document: {"x" : {"foo" : "bar"}}
Creating a Document
Inserts are the basic method for adding data to MongoDB. To insert a document into a collection, use the collection’s insert method:
db.foo.insert({"bar" : "baz"})
Batch Insert
If you have a situation where you are inserting multiple documents into a collection, you can make the insert faster by using batch inserts. Batch inserts allow you to pass an array of documents to the database.
db.foo.batchInsert([{"_id" : 0}, {"_id" : 1}, {"_id" : 2}])
output
db.foo.find()
{ "_id" : 0 }
{ "_id" : 1 }
{ "_id" : 2 }
Removing Documents
To remove record for your database.
db.foo.remove() #removes all records
The remove function optionally takes a query document as a parameter. When it’s given, only documents that match the criteria will be removed. Suppose, for instance, that we want to remove everyone from the mailing.list collection where the value for "optout" is true:
db.mailing.list.remove({"opt-out" : true})
Updating Documents
Once a document is stored in the database, it can be changed using the update method. update takes two parameters: a query document, which locates documents to update, and a modifier document, which describes the changes to make to the documents found.
joe = db.people.findOne({"name" : "joe", "age" : 20})
db.people.update({"_id" : ObjectId("4b2b9f67a1f631733d917a7c")}, joe)
Using $set modifier
$set sets the value of a field. If the field does not yet exist, it will be created. This can be handy for updating schema or adding user-defined keys.
db.users.findOne()
{
"_id" : ObjectId("4b253b067525f35f94b60a31"),
"name" : "joe",
"age" : 30,
"sex" : "male",
"location" : "Wisconsin"
}
db.users.update({"_id" : ObjectId("4b253b067525f35f94b60a31")},
... {"$set" : {"favorite book" : "War and Peace"}})
output
db.users.findOne()
{
"_id" : ObjectId("4b253b067525f35f94b60a31"),
"name" : "joe",
"age" : 30,
"sex" : "male",
"location" : "Wisconsin",
"favorite book" : "War and Peace"
}
If the user realizes that he actually doesn’t like reading, he can remove the key altogether with "$unset":
db.users.update({"name" : "joe"},
... {"$unset" : {"favorite book" : 1}})
Incrementing and decrementing
The "$inc" modifier can be used to change the value for an existing key or to create a new key if it does not already exist. It is very useful for updating analytics, karma, votes, or anything else that has a changeable, numeric value.
db.games.insert({"game" : "pinball", "user" : "joe"})
db.games.update({"game" : "pinball", "user" : "joe"},
... {"$inc" : {"score" : 50}}) #increases the score by 50
output
db.games.findOne()
{
"_id" : ObjectId("4b2d75476cc613d5ee930164"),
"game" : "pinball",
"user" : "joe",
"score" : 50
}
Adding elements
"$push" adds elements to the end of an array if the array exists and creates a new array if it does not. For example, suppose that we are storing blog posts and want to add a "comments" key containing an array. We can push a comment onto the nonexistent "comments" array, which will create the array and add the comment.
db.blog.posts.findOne()
{
"_id" : ObjectId("4b2d75476cc613d5ee930164"),
Updating Documents | 37
"title" : "A blog post",
"content" : "..."
}> db.blog.posts.update({"title" : "A blog post"},
... {"$push" : {"comments" :
... {"name" : "joe", "email" : "joe@example.com",
... "content" : "nice post."}}})
output
db.blog.posts.findOne()
{
"_id" : ObjectId("4b2d75476cc613d5ee930164"),
"title" : "A blog post",
"content" : "...",
"comments" : [ {
"name" : "joe",
"email" : "joe@example.com",
"content" : "nice post."
} ] }
Removing elements
There are a few ways to remove elements from an array. If you want to treat the array like a queue or a stack, you can use "$pop", which can remove elements from either end. {"$pop" : {"key" : 1}} removes an element from the end of the array. {"$pop" :{"key" : -1}} removes it from the beginning. Sometimes an element should be removed based on specific criteria, rather than its position in the array. "$pull" is used to remove elements of an array that match the given criteria. For example, suppose we have a list of things that need to be done but not in any specific order.
db.lists.insert({"todo" : ["dishes", "laundry", "dry cleaning"]})
db.lists.update({}, {"$pull" : {"todo" : "laundry"}})
output
db.lists.find()
{
"_id" : ObjectId("4b2d75476cc613d5ee930164"),
"todo" : [
"dishes",
"dry cleaning"
] }
Querying
find method
The find method is used to perform queries in MongoDB. Querying returns a subset of documents in a collection, from no documents at all to the entire collection. Which documents get returned is determined by the first argument to find, which is a document specifying the query criteria. An empty query document (i.e., {}) matches everything in the collection. If find isn’t given a query document, it defaults to {}. For example:
db.c.find() #matches every document in the collection c
When we start adding key/value pairs to the query document, we begin restricting our search. For example, to find all documents where the value for "age" is 27, we can add that key/value pair to the query document:
db.users.find({"age" : 27}) #returns all the users with age 27
db.users.find({"username" : "joe"}) #returns all the users with username Joe
db.users.find({"username" : "joe", "age" : 27}) #returns all the records of username Joe and age , 27
if you have a user collection and you are interested only in the "username" and "email" keys, you could return just those keys with the following query:
db.users.find({}, {"username" : 1, "email" : 1}) #returns username and email only
or
db.users.find({}, {"username" : 1, "_id" : 0}) #returns the username but not the id
output
{
"_id" : ObjectId("4ba0f0dfd22aa494fd523620"),
"username" : "joe",
"email" : "joe@example.com"
}
{
"username" : "joe", }
Query Conditionals
"$lt", "$lte", "$gt", and "$gte" are all comparison operators, corresponding to <, <=, >, and >=, respectively. They can be combined to look for a range of values. For example, to look for users who are between the ages of 18 and 30, we can do this:
db.users.find({"age" : {"$gte" : 18, "$lte" : 30}})
This would find all documents where the "age" field was greater than or equal to 18 AND less than or equal to 30. These types of range queries are often useful for dates. For example, to find people who registered before January 1, 2007, we can do this:
start = new Date("01/01/2007")
db.users.find({"registered" : {"$lt" : start}})
To query for documents where a key’s value is not equal to a certain value, you must use another conditional operator, "$ne", which stands for “not equal.” If you want to find all users who do not have the username “joe,” you can query for them using this:
db.users.find({"username" : {"$ne" : "joe"}})
"$ne" can be used with any type.
OR Queries
There are two ways to do an OR query in MongoDB. "$in" can be used to query for a variety of values for a single key. "$or" is more general; it can be used to query for any of the given values across multiple keys. If you have more than one possible value to match for a single key, use an array of criteria with "$in". For instance, suppose we were running a raffle and the winning ticket numbers were 725, 542, and 390. To find all three of these documents, we can construct the following query:
db.raffle.find({"ticket_no" : {"$in" : [725, 542, 390]}})
"$in" is very flexible and allows you to specify criteria of different types as well as values. For example, if we are gradually migrating our schema to use usernames instead of user ID numbers, we can query for either by using this:
db.users.find({"user_id" : {"$in" : [12345, "joe"]})
This matches documents with a "user_id" equal to 12345, and documents with a "user_id" equal to "Joe".The opposite of "$in" is "$nin", which returns documents that don’t match any of the criteria in the array. If we want to return all of the people who didn’t win anything in the raffle, we can query for them with this.
db.raffle.find({"ticket_no" : {"$nin" : [725, 542, 390]}})
This query returns everyone who did not have tickets with those numbers. "$in" gives you an OR query for a single key, but what if we need to find documents where "ticket_no" is 725 or "winner" is true? For this type of query, we’ll need to use the "$or" conditional. "$or" takes an array of possible criteria. In the raffle case, using "$or" would look like this:
db.raffle.find({"$or" : [{"ticket_no" : 725}, {"winner" : true}]})
db.raffle.find({"$or" : [{"ticket_no" : {"$in" : [725, 542, 390]}}, {"winner" : true}]})
$not
"$not" is a metaconditional: it can be applied on top of any other criteria. As an example, let’s consider the modulus operator, "$mod". "$mod" queries for keys whose values, when divided by the first value given, have a remainder of the second value:
db.users.find({"id_num" : {"$mod" : [5, 1]}})
The previous query returns users with "id_num"s of 1, 6, 11, 16, and so on. If we want, instead, to return users with "id_num"s of 2, 3, 4, 5, 7, 8, 9, 10, 12, etc., we can use "$not":
db.users.find({"id_num" : {"$not" : {"$mod" : [5, 1]}}})
"$not" can be particularly useful in conjunction with regular expressions to find all documents that don’t match a given pattern.
Conditional Semantics
Multiple conditions can be put on a single key. For example, to find all users between the ages of 20 and 30, we can query for both "$gt" and "$lt" on the "age" key:
db.users.find({"age" : {"$lt" : 30, "$gt" : 20}})
There are a few “meta-operators” that go in the outer document: "$and“, "$or“, and "$nor“. They all have a similar form:
db.users.find({"$and" : [{"x" : {"$lt" : 1}}, {"x" : 4}]})
This query would match documents with an "x" field both less than 1 and equal to 4. Although these seem like contradictory conditions, it is possible to fulfill if the "x" field is an array: {"x" : [0, 4]} would match. Note that the query optimizer does not optimize "$and" as well as other operators. This query would be more efficient to structure as:
db.users.find({"x" : {"$lt" : 1, "$in" : [4]}})
Regular Expressions
Regular expressions are useful for flexible string matching. For example, if we want to find all users with the name Joe or joe, we can use a regular expression to do case insensitive matching:
db.users.find({"name" : /joe/i})
Querying Arrays
Querying for elements of an array is designed to behave the way querying for scalars does. For example, if the array is a list of fruits, like this:
db.food.insert({"fruit" : ["apple", "banana", "peach"]})
the following query:
db.food.find({"fruit" : "banana"})
will successfully match the document. We can query for it in much the same way as we would if we had a document that looked like the (illegal) document: {"fruit" : "apple", "fruit" : "banana", "fruit" : "peach"}.
$all
If you need to match arrays by more than one element, you can use "$all". This allows you to match a list of elements. For example, suppose we created a collection with three elements:
db.food.insert({"_id" : 1, "fruit" : ["apple", "banana", "peach"]})
db.food.insert({"_id" : 2, "fruit" : ["apple", "kumquat", "orange"]})
db.food.insert({"_id" : 3, "fruit" : ["cherry", "banana", "apple"]})
find all documents with both "apple" and "banana" elements by querying with "$all"
db.food.find({fruit : {$all : ["apple", "banana"]}})
output
{"_id" : 1, "fruit" : ["apple", "banana", "peach"]}
{"_id" : 3, "fruit" : ["cherry", "banana", "apple"]}
You can also query by exact match using the entire array. However, exact match will not match a document if any elements are missing or superfluous. For example, this will match the first document above:
db.food.find({"fruit" : ["apple", "banana", "peach"]})
But this will not:
db.food.find({"fruit" : ["apple", "banana"]})
and neither will this:
db.food.find({"fruit" : ["banana", "apple", "peach"]})
If you want to query for a specific element of an array, you can specify an index using the syntax key.index:
db.food.find({"fruit.2" : "peach"})
Arrays are always 0-indexed, so this would match the third array element against the string "peach".
$size
A useful conditional for querying arrays is "$size", which allows you to query for arrays of a given size. Here’s an example:
db.food.find({"fruit" : {"$size" : 3}})
One common query is to get a range of sizes. "$size" cannot be combined with another $ conditional (in this example, "$gt"), but this query can be accomplished by adding a "size" key to the document. Then, every time you add an element to the array, increment the value of "size". If the original update looked like this:
db.food.update(criteria, {"$push" : {"fruit" : "strawberry"}})
it can simply be changed to this:
db.food.update(criteria,
... {"$push" : {"fruit" : "strawberry"}, "$inc" : {"size" : 1}})
The $slice operator
The special "$slice" operator can be used to return a subset of elements for an array key. For example, suppose we had a blog post document and we wanted to return the first 10 comments:
db.blog.posts.findOne(criteria, {"comments" : {"$slice" : 10}})
Alternatively, if we wanted the last 10 comments, we could use −10:
db.blog.posts.findOne(criteria, {"comments" : {"$slice" : -10}})
"$slice" can also return pages in the middle of the results by taking an offset and the number of elements to return:
db.blog.posts.findOne(criteria, {"comments" : {"$slice" : [23, 10]}})
This would skip the first 23 elements and return the 24th through 33th. If there were fewer than 33 elements in the array, it would return as many as possible. You can also use the method to print selected columns in your document:
db.books.find({}, {"title":1, "reviews":1},{"reviews" : {"$slice" : 10}})
The above code prints all the records , filter by title and reviews for each record and displays the first 10 reviews made for each book.
Returning a matching array element
"$slice" is helpful when you know the index of the element, but sometimes you want whichever array element matched your criteria. You can return the matching element with the $-operator. Given the blog example above, you could get Bob’s comment back with:
db.blog.posts.find({"comments.name" : "bob"}, {"comments.$" : 1})
output
{
"_id" : ObjectId("4b2d75476cc613d5ee930164"),
"comments" : [ {
"name" : "bob",
"email" : "bob@example.com",
"content" : "good post."
} ] }
Querying on Embedded Documents
There are two ways of querying for an embedded document: querying for the whole document or querying for its individual key/value pairs. Querying for an entire embedded document works identically to a normal query. For example, if we have a document that looks like this:
{
"name" : {
"first" : "Joe",
"last" : "Schmoe"
},
"age" : 45
}
we can query for someone named Joe Schmoe with the following:
db.people.find({"name" : {"first" : "Joe", "last" : "Schmoe"}})
db.people.find({"name.first" : "Joe", "name.last" : "Schmoe"})
Limits, Skips, and Sorts
The most common query options are limiting the number of results returned, skipping a number of results, and sorting. All these options must be added before a query is sent to the database. To set a limit, chain the limit function onto your call to find. For example, to only return three results, use this:
db.c.find().limit(3) #returns the first 3 records
skip works similarly to limit
db.c.find().skip(3)
This will skip the first three matching documents and return the rest of the matches. If there are fewer than three documents in your collection, it will not return any documents. sort takes an object: a set of key/value pairs where the keys are key names and the values are the sort directions. Sort direction can be 1 (ascending) or −1 (descending). If multiple keys are given, the results will be sorted in that order. For instance, to sort the results by "username" ascending and "age" descending, we do the following:
db.c.find().sort({username : 1, age : -1})
These three methods can be combined. This is often handy for pagination. For example, suppose that you are running an online store and someone searches for mp3. If you want 50 results per page sorted by price from high to low, you can do the following:
db.stock.find({"desc" : "mp3"}).limit(50).sort({"price" : -1})
If that person clicks Next Page to see more results, you can simply add a skip to the query, which will skip over the first 50 matches (which the user already saw on page 1):
db.stock.find({"desc" : "mp3"}).limit(50).skip(50).sort({"price" : -1})
Comparison order
- Minimum value
- null
- Numbers (integers, longs, doubles)
- Strings
- Object/document
- Array
- Binary data
- Object ID
- Boolean
- Date
- Timestamp
- Regular expression
- Maximum value
Database Commands
There is one very special type of query called a database command. We’ve covered creating, updating, deleting, and finding documents. Database commands do “everything else,” from administrative tasks like shutting down the server and cloning databases to counting documents in a collection and performing aggregations. Commands are mentioned throughout this text, as they are useful for data manipulation, administration, and monitoring. For example, dropping a collection is done via the "drop" database command:
db.runCommand({"drop" : "test"})
db.test.drop()
output
{
"nIndexesWas" : 1,
"msg" : "indexes dropped for collection",
"ns" : "test.test",
"ok" : true
}