Remote Repositories

This guide will show you how Mercurial can be used to talk with remote repositories over HTTP and SSH.

Contents

Working with Repositories over HTTP
Working with Repositories over SSH
- Filesystem Permissions
Caching of HTTP(S) Credentials
Configuring Shorthand URL Schemes
- Shortening HTTP URLs
- Shortening SSH URLs
Exercises

In the Basic Mercurial guide, you saw how Alice and Bob could push and pull between each other on the same filesystem. This is a fine setup when Alice and Bob share a filesystem between them, but most users don’t have direct access to each other’s files like that. This is why Mercurial can access remote repositories over a network connection.

Mercurial speaks two network protocols:

HTTP: the protocol used by normal webservers.

Mercurial comes with a small built-in webserver that you can start with hg serve and which will let you browse the repository history using your normal webbrowser. For more heavy-duty use, the hgweb.cgi script is recommended.
SSH: the secure shell protocol used on many Unix systems.

If you already have SSH login on a server with Mercurial installed, then using SSH is the easiest way to clone a repository. It is possible to setup an account with a restricted login shell so that users can only execute Mercurial-related commands do not get a full login shell, see the hg-ssh script for details.

Working with Repositories over HTTP

We start by taking a look at how you can interact with repositories over HTTP. This protocol is very popular since it integrates well most companies’ existing infrastructure: when you have a functioning webserver that can be accessed on port 80, then you have all you need to serve Mercurial repositories too.

Cloning over HTTP

We will start by letting Alice clone a small example repository:

alice$ hg clone https://bitbucket.org/aragost/hello
destination directory: hello
requesting all changes
adding changesets
adding manifests
adding file changes
added 1 changesets with 1 changes to 1 files
updating to branch default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved

As you can see, this is not relly different from when Alice cloned a repository using a filesystem path: in both cases, Alice ends up with a complete copy of the repository and the default branch is checked out.

Under the covers, Mercurial is able to make the local clone much more efficient than a clone made over HTTP. This is not just due to the fact that reading data over the network is slower than reading it from a disk, but also because a local clone will re-use the space in the .hg directory by using hardlinks between the files. The two clones will thus share the disk space used by the two .hg directories and it is only the working copies that take up new disk space. This is what makes it feasible to create as many local throw-away clones as you like.

Alice can now make her own commits in her clone:

alice$ cd hello
alice$ echo "Hello, World!" > hello.txt
alice$ hg commit -m "Add comma"

This is the essence of distributed revision control — Alice can get her own independent copy of the repository and work on it locally. She can compare her clone with the remote server:

alice$ hg outgoing
comparing with https://bitbucket.org/aragost/hello
searching for changes
changeset:   1:61c1daa1d929
tag:         tip
user:        Alice <alice@example.net>
date:        Sat Jan 22 10:00:00 2011 +0000
summary:     Add comma

She won’t be able to push her changeset to the server since she does not have write access to this particular repository.

Serving a Repository over HTTP

As mentioned earlier, Mercurial has a built-in webserver. You can use this to quickly share a repository with another machine on your LAN, or even for browsing the history yourself.

We will let Alice serve her clone of the hello repository:

alice$ hg serve 
listening at http://localhost:8000/ (bound to 127.0.0.1:8000)

The repository can now be browsed with a normal webbrowser at the address http://localhost:8000/. There you can see the project history and the changeset graph, you can examine individual changesets, you can see lists of tags and branches, you can annotate files, and you can retrieve any revision as a tarball or a zip-file. In other words, the built-in webserver is very convenient for humans, as well as for computers :-)

Bob can make a clone of Alice’s repository:

bob$ hg clone http://localhost:$HGPORT hello
requesting all changes
adding changesets
adding manifests
adding file changes
added 2 changesets with 2 changes to 1 files
updating to branch default
1 files updated, 0 files merged, 0 files removed, 0 files unresolved
bob$ cd hello
bob$ cat hello.txt
Hello, World!

If Bob makes a change and tries to push it back, he is met with the following error:

bob$ hg push
pushing to http://localhost:8000/
searching for changes
remote: ssl required
remote: ssl required
updating 61c1daa1d929 to public failed!

What happens is that Mercurial’s webserver won’t let you push over plain HTTP by default, it requires you to use a HTTPS URL. Alice can disable this requirement by using --config web.push_ssl=No on the command line when she serves the repository. She first kills the old hg serve process and then starts a new:

alice$ hg serve --config web.push_ssl=No 
listening at http://localhost:8000/ (bound to 127.0.0.1:8000)

When Bob tries again he is met by a new error because repositories are read-only by default:

bob$ hg push
pushing to http://localhost:8000/
searching for changes
abort: authorization failed

Notice that the push was aborted without giving Bob any chance of entering a username or password. The reason for this is that the built-in webserver does not support authentication. That is, there is no user management built into it. This may sound odd, but the idea is that you would not run hg serve in a production environment. Instead you would run the hgweb.cgi script supplied with Mercurial and you would run this CGI script in a real webserver such as Apache. This webserver will have the necessary infrastructure to do proper user authentication. An advantage of this setup is that you can do the authentication using whatever method you prefer: if you have a setup where you authenticate users against an LDAP database, then you just reuse that for Mercurial.

For our example, we will let Alice disable the authentication check with yet another command line option:

alice$ hg serve --config web.push_ssl=No --config "web.allow_push=*" 
listening at http://localhost:8000/ (bound to 127.0.0.1:8000)

Bob can now push his changeset back to Alice:

bob$ hg push
pushing to http://localhost:8000/
searching for changes
remote: adding changesets
remote: adding manifests
remote: adding file changes
remote: added 1 changesets with 1 changes to 1 files

Here there were no authentication, but for most real-world repositories you will have to authenticate in order to push changesets back (and sometimes also in order to pull changesets). We will discuss caching of HTTP(S) credentials below.

Filesystem Permissions

When you access a repository through the hgweb.cgi script, then the access is really done by the webserver. The webserver process will run as some user on the server and that user must have read access to the repository files in order to serve them.

When people push changes back over HTTP, it is also the webserver process that writes the new files in the .hg directory. The usernames embedded in the changesets play no role here.

Working with Repositories over SSH

The other network protocol supported by Mercurial is SSH. When using SSH URLs, Mercurial will login to the server and setup a SSH tunnel between two Mercurial processes. The two processes will then communicate with each other in order to push or pull changesets.

This implies that there need to be a SSH account on the server. Many system administrators will therefore prefer the HTTP based setup instead since that ties into their existing webserver setup.

However, if you do have an account on the server and if Mercurial is installed on the system, then the SSH protocol is very seamless for a user. The only thing to remember is that the syntax is a URL syntax and not a scp or rsync SSH path. So you write:

$ hg clone ssh://server/path/to/repository

and not:

$ hg clone server:path/to/repository

Note also that path/to/repository is relative to your home directory on the server. If you want to use an absolute path on the server, then use a URL like this:

$ hg clone ssh://server//absolute/path/to/repository

The first slash is part of the URL syntax, the second slash is part of the path on the server, so that the above URL will find /absolute/path/to/repository on the server.

Unlike with HTTP URLs, you can use a SSH URL as a target for hg clone. This lets you do:

$ hg clone . ssh://server/path/to/repository

in order to clone the current repository to the server.

Filesystem Permissions

Because Mercurial makes a login on the server, it is the user on the server that must have read access to the repository files in order for you to make a clone or to pull changesets. Likewise, it is the user on the server that must have write access to the repository files in order for you to push changesets back.

If you see errors when pushing changes over SSH, then add --debug to your push command and see what Mercurial is doing. Try logging in with the same user over SSH and check that you can access the files.

Caching of HTTP(S) Credentials

When you talk to a webserver, it can prompt Mercurial for a username and password in order to authenticate you. You will then normally be asked to enter the information in the prompt:

$ hg clone https://bitbucket.org/aragost/private
http authorization required
realm: Bitbucket.org HTTP
user: aragost
password: <secret>
abort: http authorization required

Since you will have to authenticate on every command that involves the remote repository (that is commands like hg clone, hg pull, hg push, hg incoming, and hg outgoing) you will quickly get tired of this. There are several ways to make Mercurial save the credentials. We will present them here in order of preference.

Tip

You may be wondering how you can make Mercurial cache your SSH passphrase. The answer is that you cannot do this — the SSH authentication is external to Mercurial and you must use a standard SSH agent to cache the passphrase.

An SSH agent is a program that runs in the background and keeps a decrypted version of your SSH private key in memory. Whenever you need to make a SSH connection, the ssh program will ask the agent if it has a suitable decrypted private key. If so, the connection can be made without you entering any password, otherwise ssh will prompt you for a password like normal.

You add your key to the SSH agent with ssh-add on Linux and Mac OS X and you use Pageant when using Putty on Windows.

Keyring Extension

In short:

Pros: passwords are stored in a OS-specific secure backend, most secure option.

Cons: requires third-party extension.

The keyring extension will hook into Mercurial and intercept password requests. The passwords you enter a then stored securely in an OS-specific password database and you won’t have to enter them again. It stores passwords used for HTTP(S) authentication and SMTP authentication (as done by the patchbomb extension among others).

This solution is the standard solution for Windows when you use TortoiseHg since they ship both the extension and the extra libraries needed to talk with the Windows password backend. On others systems, you will have to install the apropriate backend libraries yourself.

Stored in User Configuration File

In short:

Pros: standard feature in Mercurial. Makes it easy to setup the password used for all repositories on a given host.

Cons: passwords are stored in a plaintext configuration file. Care must be used if the file is shared with others.

You can save the credentials directly in your Mercurial configuration file. You do this with the auth section:

[auth]
bb.prefix = https://bitbucket.org/
bb.username = alice
bb.password = <secret>

The [auth] section contains a number of entries, and the entries are grouped by an arbitrary key chosen by you. Above, we used bb as the key for Bitbucket, but we could have picked anything. If you have more sites you want to store the password for, then you need to use different keys for each host:

[auth]
site-a.prefix = https://hg.site-a.net/
site-a.username = userA
site-a.password = <secret-a>

site-b.prefix = https://site-b.org/repos/
site-b.username = userB
site-b.password = <secret-b>

Embedded in Push/Pull URL

In short:

Pros: requires no out-side configuration.

Cons: password is stored in plaintext.

This final way takes advantage of a built-in feature in the specification of URLs: one can embed a username and password directly into them. The URL syntax is:

scheme://username:password@domain:port/path

so if you execute

hg clone https://alice:<secret>@bitbucket.org/aragost/private

then Mercurial will automatically use alice as the username and <secret> as the password when logging into Bitbucket. If the clone is succesful, then the full URL is stored as the default path in the .hg/hgrc file like normal. Because the username and password is stored in the URL, future invocations of hg pull and hg push will do the authentication automatically without any prompts.

Configuring Shorthand URL Schemes

When you work a lot with repositories on the same host, then it might become annoying to repeatedly type:

hg clone https://hg.my-long-servername.com/repos/

Shortening HTTP URLs

Mercurial has a standard extension that will help you shorten those URLs. You enable the schemes extension and can then add the following to your configuration file:

[schemes]
bb = https://bitbucket.org/

This lets you write:

hg clone bb://aragost/private

instead of the longer:

hg clone https://bitbucket.org/aragost/private

See hg help schemes after enabling the extension for the full syntax.

Shortening SSH URLs

You can use the schemes extension for SSH URLs too, but it is also interesting to note that OpenSSH has its own way of shortening URLs. Add these lines to your ~/.ssh/config file (creating it if needed):

Host bb
Compression yes
HostName bitbucket.org
User hg

This will let you write:

hg clone ssh://bb/aragost/private

instead of:

hg clone ssh://hg@bitbucket.org/aragost/private

Notice how this configuration even set the username for you, something that is particularly easy to forget when using SSH with Bitbucket. We also used the oportunity to enable compression for the SSH tunnel, which will improve performance overall.

If you are using Putty on Windows (TortoiseHg does come with a bundled install of Putty) then you can do the same by configuring and saving a connection. If you save the configuration under the name bb, then you can begin using ssh://bb/ style URLs with Mercurial.

Exercises

Go to https://bitbucket.org/ and create an account for yourself.
Create a private repository called test and clone it to your local machine.
Add and commit a file to the repository and push it back to Bitbucket.
Make a clone of https://bitbucket.org/aragost/hello/ on your local machine.
Create a new repository on Bitbucket called hello and push the hello clone from your machine up to Bitbucket.

Notice how you can push changesets into an empty repository. This is because you can expand a hg clone to hg init followed by hg pull (except that you won’t get hardlinks as described above).

What happens if you try to push from your hello clone to your test clone on Bitbucket? How does Mercurial know if the repositories are related?