The ability to proxy remote repositories and cache external artifacts from them is crucial whether they are Docker images, NuGet packages, npm, tar.gz files or any of the dependencies we use to create our own products. It speeds up our builds, ensures reliable access, gives control over the bill-of-materials and offers many more benefits making it a practice you cannot live without in today’s CI / CD domain.
When working with Artifactory we will normally have one remote repo pointing to JCenter and others pointing to additional relevant, public repositories such as Docker Hub, NuGet Gallery, npm registry, PyPI and more. Complemented by the ability to configure virtual repositories, we provide our build tools, users and different clients a single endpoint from which to resolve all required artifacts, first looking through the local and cached items, and then searching remotely.
This works very well when our dependencies are all specific release versions. Things get more complicated when you have geographically distributed teams working on the same project or when you have different co-dependent projects that are constantly modified and need to stay up-to-date with each other’s snapshot version.
Replication is the answer… or is it?
One solution is to use Artifactory’s ability to replicate repositories (pull for remote, push and multi-push for local), which is perfect if you need to stay in full sync. The sync can be timed on a cron expression or be triggered by events. This will take care of actively downloading new artifacts, deleting the ones that have been removed remotely, and making sure the properties are always in sync.
The problem with replication is that it can get load and bandwidth intensive when repositories are laden with many artifacts. Wouldn’t it be great if you could get only those artifacts that you need for your teams to sync up, and avoid unnecessary load and network traffic? Well, guess what. You can. Let me introduce you to one of Artifactory’s latest features…
Smart remote repositories
In Artifactory, a remote repository is represented by the URL of the remote resource from which you download and cache artifacts. But what if that URL happens to point to a repository in another instance of Artifactory. This kind of kinship begs to be utilized. If my Artifactory is proxying a repository in another Artifactory, there’s no reason why these cousin instances shouldn’t talk to each other and do some smart things.
- Automatic detection
So the first thing is that Artifactory recognizes its own kind, and if your remote repository’s URL points at another instance of Artifactory, you will be presented with a dialog on which you can configure how these two instances will interact.
- Report statistics
One of the challenges of on the other side is how to be able to know which items are being consumed on other repositories. Currently we are able to know the amount of time an artifact has been downloaded, last download time and by whom it was last downloaded. We often use this information in our clean up scripts to make sure we don’t remote an item that will break someone build. The issue here is that once an item is cached the counter is increased by one and any download request to follow will be server directly from the cache making it impossible to know if this item is actually still being used or not. When checking this option every download request on the remote cache will also update a new field named Remote Downloads keeping track of number of remote downloads made from other cached repositories proxying our repo. This field can be used in our clean scripts to make sure we can safely go back to work on Monday without the fear of a note waiting for us with a promise of very bad happening to us before launch.
- Sync properties
Normally, once you have cached an artifact from a remote repository, you will not be aware if any of the properties annotating the artifact at the remote resource are changed. But with smart remote repositories you can. If you check this box in your remote repository configuration, every time there is a request to get an artifact’s properties, Artifactory will validate their values against the corresponding properties on the original artifact in the remote instance. Any changes to properties on the remote item (update, add, remove) will be automatically synced to your locally cached copy without you having to download the artifact again. So, for example, if you download an artifact whose status is “Release Candidate”, and the remote team building it later changes the status to “Integration Test Failed”, the status on your locally cached copy will be automatically updated next time you check if you’re good to go with that artifact. You don’t want to release with an artifact that fails integration testing, now, do you?
- Remote list browsing where you never thought possible
Many of the package types supported in Artifactory do not offer list browsing for a variety of reasons, however, since smart remote repositories keep things in the family, Artifactory knows how to overcome this limitation and lets you browse remotely in places you always wanted to, but couldn’t, such as Docker, NuGet, npm, Bower, PyPI and many more.
- Deleted indication
If an item you are caching has been deleted from the remote repository you will be able to see this indication in Artifactory and know that you are using an artifact that is no longer available, this can be very useful if for some reason you are depending on this artifact so you might want to move it to a local repository so the next time you will clean up your cache you will no longer be able to download it again from remote repository.
This is just the beginning. You can look forward to features like synchronization of download stats, transitivity sync when chaining multiple Artifactory instances, executing AQL searches on remote repositories, pushing artifacts remotely and much more as smart remote repositories just keep getting smarter.
(Updated Oct 2015)