Viktor Gamov / Hazelcast, May 2016: Application scalability is hard. Application scalability done right is even harder.
While some might try to write a distributed clustering framework from scratch, a Pragmatic Developer, from another hand, most likely will leverage mature open source framework, like Hazelcast, that takes care of all the hassle. In this session, you will learn how the pragmatic developers of JFrog benefited from using battle-tested clustering capabilities of Hazelcast while building Artifactory High Availability feature.
My name is Viktor Gamov I work as a solutions architect with Hazelcast. Essentially what solution architect does it solves the problems. And, you know, I think that you got the reference from the gentleman who also solves all your problems.
And today we’re going to talk about the Artifactory HA and what kind of choices developers did to make this solution solve their problems.
So essentially, Artifactory H — HA has simple idea. Deliver your binaries at any cost. So that’s why there’s multiple instances of Artifactory. Many things will be duplicated, because in real world everything might break. So this is why the high availability of the product that essential for infrastructure for developers, this is like a very, very big goal for JFrog as a developer of this product.
And many things that I’m going to talk to you today, you can find in the Artifactory HA. So essentially, as a distributed system or highly available system, multiple problems need to be solved in Artifactory HA.
So, first of all, this is diagram that explains the basic architecture of Artifactory HA. As you can see, as a back end, Artifactory can store data in the multiple sources, including database in full cache mode. So you also include the binaries in the database. Also the binaries can be stored on network attach storage. Why network attach storage? Because the multiple instances can, need to have access to this. And it’s a quite rational choice to use network attach storage or any network attach storage that will be available for multiple node access. Plus, some of the things that’s coming into the Artifactory chain, for example integration with Amazon S3. So in this case, you can get over this NSF monstrosity and […] things and use the right and the scalable solution. From the very fine […].
So some the information that Artifactory clusters — Artifactory members need to exchange information about the caches, information about some of the stuff that stores in method data, plus concurrent control over your sources because, as you can see, in the FAS here or the database here is the shared resource so we need to have some sort of the control over the access, like over the concurrent access.
So. And there’s a couple, you might even say it’s a problem, but I want to think of this as challenges. So couple challenges. It’s a concurrent and distributed access for binaries. As you can saw — as you saw from the picture, the NSF is like the central, center storage, even though it has some of the backups and some of the resilience […] on the hardware level, it’s still one single source from perspective of Artifactory. So to provide access to these resources, this access can be, should be controlled. And because this access should be controlled over what node will do something. So for example there’s a master node that will run garbage collection on your Artifactories or run some of the node can run the indexation when you upload new jar and you need to parse the metadata of Maven or other repo — other artifact.
So this kind of challenge is that, because the multiple instances, they need to have the same picture of the world, the same picture of who has the access to particular binary in one moment of time. Some of the information that Artifactory members need to access is information metadata about repositories, about which file belongs to what repository, and et cetera. So repository cache is another challenge that the developers of Artifactory HA […].
So as I’ve already mentioned, some of the tasks that run in the background, for example, when you run the index or when you run some of the check sum checks in garbage collections so this job’s need to solidify a different component of the systems that job is done or job in progress and many other things. So essentially identification about job progress is another — another thing that we’re going to talk about how this work in Artifactory chain. And for a reason that you probably all understand, the connectivity is essential for distributed system. Right? So in this case, the Artifactory chain is more network bound system than I/O bound. Right? In this case the speed of the network is important, and plus, discovery of other members, when these other members of the cluster will join. This information needs to be stored for failover in case of one node cannot perform the job so we need to send the job to another node if the node is not available.
Some of the things that I just — the things that I just explained I just focus on four things because it’s easy to grasp. Easy to remember when, because I wanna — you people remember something after this talk, even though yesterday was nice […]. And it’s a couple of things easy to remember, the three, four things that people can usually remember.
And this is a couple of things I’m going to focus. Particular APIs in particular, let’s call it pieces of this jigsaw puzzle that the Artifactory HA developers adopted in the product. So we’re going to talk about distributed lock that provides concurrent access over — over shared resources. Shared resources like binaries. We’re going to talk about a distributed map that provides the distributed cache for some of the metadata information in Artifactory HA. A distributed topic that provides the framework to do distributed notification — notification of different components. Membership listener allows to get information when the member join or member leaving or some of the information about member has changed. Plus I’m going to talk about something called the Discovery SPI that […].
So why Hazelcast? And what is Hazelcast essentially. So Hazelcast is essentially also solve like the three simple problems in distributable but there’s like no simple problems. This one you want to make them possible solve and three things is distributed storage, distributed messaging, and distributed computing. Today we’re going to focus on distributed storage and distributed messaging. Maybe next year I’m going to talk about distributed computing […].
For Java developers, Hazelcast looks like distributed versions of Java collections. Essentially that’s — this is pretty much it. So this is why Hazelcast is very easy to integrate with any — with any product. So let’s take a look on the first, on the first problem is distributed concurrency control.
So doing distributed concurrency control might be easy, so how many of you, if you have developers here, how many of you done, like, distributed lock implementation in your life at least once. Yeah. We have people here who know how it’s done. You can live with this but it can be quite painful to support, quite painful to use, for example, I use in my […], I use database as my storage of distributed lock. So one node will ride lock into table into database another node will read this from database. As you can see from this picture, you will get there. You know, you’ll get from the place you want to go, but it depends, you want a, like, smooth ride, or you want a painful ride, and you want to, you know, enjoy the ride.
So that’s why we’re going to talk about the component that called ILock or distributed lock. So as I already said, Hazelcast provides distributed version of Java collection, so that’s why it’s very easy to start using this. So in this case, the ILock is extend Java which is concurrent lock. It follows similar paradigms. It’s reentrance so if member acquired the lock once, acquires the same lock once again, you will not be in the situation of the deadlock. So you will — you’ll be able to get the lock. If other member tried to acquire this lock, nothing’s going to happen. Lock will be released if member goes down. There’s other semantics of releasing or, like, forcing to release the locks that’s available. And lock will be acquired by the first member who tries to acquire. And there is a condition. There’s additionally […] same thing with Java which is concurrent lock. There’s a condition that can be used from the other member that will be condition for releasing this lock.
So essentially how it looks like, it looks like — like this. So Hazelcast instance. This is starts the Hazelcast process. In this case it would be embedded process. Now we’re getting lock. And this piece of code can be executed in multiple — multiple instances of the application so this lock will be distributed and will be available for access for any — for any member. Right? So essentially, couple of things to remember. We need to get lock and do not forget to release it because in this case no one else can get access unless you, like, force unlock.
So let’s take a look real quickly. How this thing actually works in real life and how does it look from perspective of live application. Just one second. My computer’s switching screens. All right. Okay. Where’s my terminal? Are you guys able to see well? Let me make it bigger. This is kind of stuff that I didn’t check before. Shame on me. All right.
So what I have here right now is a simple application console. Essentially it has embedded Hazelcast and we’ll talk about different topologies that Hazelcast supports but think about this as come online interface for some things I can do in Hazelcast. I can do like help and see different commands that they can run. So in this particular case I’m interested in using a log on particular map, so in this case. So first of all I’ll do, one member can do lock for key one. So in this case key was acquired and when I’m trying to do something with data by key one from the other member this guy — like this guy over here, it’s a different process. So it’s a different process, it’s a different JVM. So what I’m trying to do. M dot. First of all I can get something. Some data, right. I can try to get, and I’ll get the result now because nothing is there yet. But when I’m trying to m dot put and I’ll do dice like test one. You see, this process stuck because there is right now another process that holds this lock.
So this guy from the other part what you can do, you can do put test, test two without any problem. So result returns a previous result in case it’s null. If I run it like this it will return previous result which is a test two. Essentially same semantics that map dot put provides in Java collections. So what this guy can do? So in this case it just sits there and wait. So now I need to unlock. As we can see, successfully unlocked, write the data, and return a last value that was written by this process. Right? Does it make sense? I hope it does.
And many different API wise things support it. So I can see here. If I go here. Say map. I can try to get lock for some of the time. If I didn’t get the lock, you know, I can wait, lock will then be unavailable, I’ll just not going to get it and will fail. And multiple other things we gonna talk about this a little bit later.
Okay. Now I’m returning to my slides.
So. So the first use case was distributed concurrency control. So essentially if Artifactory HA needs to do some important stuff in some particular binary, some particular artifact, it can acquire lock. A lock can be acquired for the path so path would be essentially key in this case because the path is sort of unique. So we have this — we don’t have this problem because — we don’t have the problem with uniqueness of the keys.
So next thing is distributed cache. So distributed cache essentially is the piece of — piece of application, piece of software that allows too many processes, many JVMs, get access to same data. Cache data. And because the data is in memory, cache provides fast access comparing to the access directly through the calling of the API. Or in […] database, sequel script. Or trying to run some computation. So essentially cache is a temporal storage that provides fast access.
Ya cache. Cache, this is things that we use to speed up. But when we talk about cache we think about something like a memcache usually, right. But couple things we cannot get from the memcache. Right? So in this case, Artifactory HA is a product. It needs to be self-contained. So the developers can put this requirement for the clients, saying hey if you use a cache, you need to have another piece of infrastructure, blah, blah, blah or you want to do, like, scalable cache, you need to bring another infrastructure and in this case, will work with this piece of software but we’re trying to avoid, they trying to avoid this. And before I’m going to — into the place about distributed caches, I want to talk about why memcache is actually maybe not that bad — maybe better solution, a little bit about data distribution in distributed system and distributed cache.
I have a question to you. What do you see in this picture? Remember, we’re talking about data distribution. No, you quiet. Sit quiet.
Okay so we see here clones. Correct. What do we see here?
[Audience] Now that’s a tough one. Clones are easy to […].
Who knows why I see, what I have here. So essentially I give you small hint. On both pictures you have a ship.
Stripes. Okay. What else? What’s the other name that I’m looking for?
[Audience] Here you go. That’s why you use a chief architect.
Yes, we see here shards. And this is why I like to use this picture to illustrate two basic concepts of data distribution in distributed systems. It’s a replication and sharding. Also known as a partitioning. So what’s the difference between them? When we have replication, we essentially store same data on each and every node. So in this case, this approach might be not super scaled because it will be limited by the capacity of the smallest member of the cluster when you try to replicate. And […] the same data, you need to make sure data is a consistent over and over. And this is why sharding considered as de-facto approach for distributed data.
So sharding essentially provides you a view, provides you the way how the data would be distributed so in this case bring in more machine will scale storage for this shared data. And when we talk about deployment options, the, some of the deployment options might be very natural for some of the products in case of Artifactory HA, it’s embedded mode. This is why I said things like the memcache maybe not suitable. Because in this case, you need to bring in another infrastructure and the memcache is written on C++ you need to have some driver that you can get access to memcache systems, in other network calls, et cetera.
So when you have this distributed cache that’s embedded in your application, you don’t have big promise because most of the time, like we’re Java developers, I assume, most of us Java developers. And, at least the Artifactory HA written in Java. So using the Hazelcast has embedded libraries. Essentially just a shard drop in a web […] lib directory.
But still, it doesn’t limit you with choices. And Hazelcast provides also classic cluster client apology where, you guess cluster can be deployed separately. And it allows to do management or separation of the concerns of scaling application versus scaling the storage for the data. Embedded choice is very natural for application that need to use the clustering capabilities.
So, and when I’m talking about the data, data is always in memory. But how can it preserve data safely and with resilience? I will store the backups in memory. Does it sound — does it sound very good for anyone here? It does sound good for me. And I will explain why.
So, the way how it looks like for a developer, developer works with one simple cache or map abstraction. And in this case, regardless of where the data is actually stored, it always has simple put and get API. But internally and it’s handled by Hazelcast in this case, data is distributed. So now if we have four nodes of Artifactory HA, data that this Artifactory HA stores will be distributed across multiple nodes, so you can see, we have here whole piece of data, but this data will be stored — this piece of data will be stored node one, node zero, and the backup of this data will be stored on another node. So in this case, if this node goes down, or this node goes down, data will be restored from the backup and the backup data of this — of this partition still available in the primary partition.
So this is simple — simple view on the sharding or partitioning. There is a different configuration options available, for example replication factor or like backup count. We call it backup count. People like to call it replication factor. There might be multiple backups for the same data. So in this case, you can — you want to try and survive many failures. You can survive failures of two nodes simultaneously. And when the node goes down, the Hazelcast takes care of rebalancing data. Repartitioning and restoring data from the backup.
Let’s have a quick look how it works in the real world application. So while my computer’s switching to my console screen, I’m trying to — I will explain you premise. I’m gonna use the same console application that I use for doing logs. And because — because it essentially represents the thing I just want explained. And I already have this running here. So let me do. So one of the commands that I can use to populate data, so I just need some of the data so I can do my […], if failover.
So there’s a command called, where is it? Put many. Put many, I will put some of the first hundred thousand of elements with one kilobyte in size in this memory. And the way how it can absorb this is the management center console. So this is a web application that allows me to see the status of my cluster. Or of my application that use embedded Hazelcast. So we have here two nodes. And also with this management console, I see how this data is distributed across multiple nodes. I don’t know if you see here and I don’t have this fancy thing with — no I have it. Like, pretty much responsive design. Right? Not super responsive, though.
So as you can see here, data is distributed across –across nodes and I have data sort of node one, backup to node two. So what I wanted to do here, I just want to come here, go ahead and just kill this node. As you can see, member number two recognizes there was a failure. It just rebalanced the cluster and if I switch back to management console, oops, wrong one, yes, this one, data is still there. Data is not lost. Though I just, you know, took the plug from the socket, essentially. Yeah. As you can see here, backups doesn’t mean, doesn’t really mean a thing when you have just one node. So in this case, there is no backups if you’re running in just one node. And as you can see from the console, the memory consumption of this particular starts growing cause that data that initially stored in two nodes now need to be fit in one node.
Another, another easy thing how we can fix it and I just need to bring in another node – another node back. They will join in a second. And they establish connection, and now I can continue doing my experiments here and data will be rebalanced and overall data consumption on separate node will go down. So in this case, adding more node will contribute with another storage in your application. Couple other things that I can see from the management center. I’m going to return to this one in couple seconds. When we’re going to talk about the — some of the things, how it’s done in Artifactory. But essentially you can see different data structure can be — can be seen over, over here.
All right. Do I have any questions so far from the audience? I don’t know why my power point switching is terribly slow. I’m trying to fill this awkward silence. With some awkward —
[Audience] Just do something.
Okay. Here we go. So I will not be able to.
All right. The third thing is the messaging and the notification between different components of the system. And again, you need to remember where we are. We’re already on the Java side, we’re on the Artifactory HA side and different components need to get a notification. Right? So what you can do, there’s some of the jobs that can listen for the topic. And if there is a new notification, this job needs to do something.
And the way how it works in the Hazelcast, it provides — Hazelcast provides distributed queue if you need to have one publisher, one subscriber type of access. Plus, there is the topic which allows to have one publisher, many subscribers. So different components can send the data over to Topic Bus.
And this Topic Bus essentially distributed so meaning that the topic, access to the topic, can be access available for any component in the systems. So if you think about this client but essentially just a different component. One component can publish. Another component can listen. For example, this is a publisher. There’s some — some of the main — main member that will send this information because, for example, if there is a garbage collection procedure is happening, or the index procedure, other components need to be notified when it’s done.
Next. Connectivity and discovery. So, to get some of the information about the cluster and to store this information somehow so the Hazelcast, Artifactory HA uses this component called a membership listener. So when they’re two ports that you usually configure in your configuration when you configure Artifactory HA. You can configure just the typical port for your HTTP. Where your build script and your build infrastructure will hit to get the artifacts and there is a membership port. So essentially, there is a, you need to provide the port that will be used other members to join the cluster. And with the membership listener some of the information about other members will be acquired. So when the members — when the new member joins the cluster, we’ll get the membership event where we can get full information like an IP address, some other information: IP address, port, instance name, et cetera. Same thing with, when the membership — the member is removed. If someone reboot of the member is changed, this listener will also will be triggered and the member attribute event will be — will be placed.
So, for the last, quite some time, I guess, like, two years. We got enormous amount of requests to support different discovery options. Because Hazelcast, very similar to Artifactory HA were trying to be self-contained and we don’t want to force developers to use any particular technology, so that’s why all Discovery that works out of the box. We don’t use any external dependency, we’re not using JGroups, it’s limited by us. So that’s why it’s just one jar which contains like a seven megabyte jar, which contains everything that you need. But what the things that we struggle, we want to get the answer for community and finally we have this answer with technology called Discovery SPI.
For the — for some time, for a couple months, last year, we actually worked with our community members, including the gentleman that’s going to speak in this room after me, Ray Tsang. He is a Google Cloud platform evangelist. And he actually contributed a lot of, for example Kubernetes integration for Hazelcast, was contributed by him. Plus he also provided a lot of feedback while we designing the API.
So, in terms of Discovery. So now, if you want to use TCP/IP discovery, which is fine, you can just put it on the configuration file. Sometimes or most of the times when we’re running inside a cluster, we don’t know IP address or we don’t know host up front. So in microservices world where the components like Hazelcast and Artifactory HA can be part of the big infrastructure, in this case, using Discovery SPI, this project can get this information from — seems like a console, or Zookeeper. Or, right now, we also provide integration with other cloud providers including Azure. Amazon. People want to run the Artifactory HA or — and Hazelcast in containers, we also provide the ability to use Discover using the Docker networks. Plus, run the Hazelcast cluster within a Mesos cluster configuration system. And the past support is also essential.
So there’s a couple things that we — that we do for Discovery and stuff. Zookeeper, Kubernetes, Eureka, that works in Cloud Foundry and or, you know, Spring Cloud, […] stack.
Okay, so let’s see how — how this actually — how this actually looks like when we use — how this stuff is done in Artifactory. So, I’m going to use the same tools that you saw today. I’m using the management center to demonstrate some of the things that you probably might see in Artifactory HA. So I’m going to use management center. Oops, wrong button. Also happens. Okay. Okay, and now we’ll be able to connect to running Artifactory HA. So but let me return to my console and I will show you couple things.
So I have two processes that runs Artifactory HA right now. I want to make this one bigger so you will see that I’m actually running Artifactory. Yeah, there’s this. Artifactory HA. This is very nice ASCII — ASCII graphic. This HA node identifier. And when, this is because this is the first node when it started, I can see the message from. So this is — because there’s no […] in the first node, so there’s no other HA members so this is like master. And how I can see this, I can check information in database where I can see that I have a primary and I have a member roll of the cluster. So I’m actually, I’m looking to Oracle database and information about Artifactory. If I’m to […] the same — the same info in other member, okay. Oops. It should be — it should be one member. Anyways. Happens. All right. So. And I have the Artifactory HA running two members. And I’m not particularly interested in here to look inside, you know, the […] build script, I wanna show you how and what pieces of Hazelcast is used here.
So, first of all, what we can see here, different topics here yield to do notification of the components. If a job that calculates some Debian method data will paste the information, will print this information into this topic.
The different configuration change. If someone change the configuration of the Hazelcast — of the Artifactory HA, the Hazelcast topic will continue information what needs to be done on the member side. Plus, the maps here used as the cache for remote repositories. So this is information that I can see just, just looking inside. Another very useful insight that tools like Hazelcast can provide, is JMax console. With JMax console we also — if you want to hook up our enterprise inventory system with JMax and see the status of the cluster. We can get that information from the JMax. So for example, I can see here that the symbol for, which is another distributed concurrency control primitive is used, I can see here there is a symbol for that will be acquired if indexes is running by primary members. I can see the status of the topics, for example, where is the calculate Debian. Because I didn’t run any Debian in my machine so there’s no data, but sealed data will be available.
And again. All right. So this is — this is how it looks from — from perspective of Artifactory HA. And I’ll have a couple slides with some key takeaways enough that I can take some of the questions. But, what we essentially saw today. Some interesting things. And this is my checklist, what I usually do after some conference. Yeah. We didn’t see some cat videos though. We saw some weird shit.
All right. Now. How can we make things even better? So distributed executor service for distributing the jobs. So right now, the Artifactory HA relies on the member. On the primary member to do this housekeeping stuff, like running indexes, running garbage collection, and et cetera. We distribute executor services that is part of the Hazelcast computing capabilities. It allows to do things like running the job on the particular node, or running the jobs on a set of nodes, by a particular set of the keys. So data will be sort of pointer where the job will be executed. So destination affair or like data where executor service.
Another option that require for the systems like Artifactory HA that when they go into masterless mode, is ability to do fail over of the jobs on other members, which is also will be possible in Hazelcast three-seven that will be released in a couple weeks.
Entry processor to do atomic data notifications. So right now, the Artifactory HA relies on the logs, so in this case, the situation of pessimistic logs, you always need to log before they do some of the updates of the data. Entry processor allows to avoid do pessimistic logs. It’s not even — it’s not even locked by itself. Entry process allows to do atomic preparation of the data without any logs or like optimistic and pessimistic logs. Though optimistic API is available, operations like put […] also available cluster wise.
And Discovery SPI for cloud and custom topologies. So right now before we actually introduced this in Artifactory HA was long time user of the Hazelcast for these purposes. Some of the things what did you use in the member listener and other stuff right now can be done using Discovery SPI which would be standard implementation with the things what they do. They can choose different implementations, different — for different approaches, for different client from cloud for writing on premises and et cetera.
And this is most — I think it’s most important slide here, right. So you can always write your own cluster software. Right? You can always write this and we know the many of us like very savvy engineers we do this not once, maybe two times, three times. However, you know, it will get you from the point A to B. Right? It has two wheels, you can steer, you can drive it but if it doesn’t need to be like a painful ride or it can be the nice ride where you can just enjoying the ride, don’t think about the things. And the pragmatic approach that developers of Artifactory HA took is not to reinvent the wheel or invent the bicycle and use the open source — open source technology and to build it in their product and focus on the things that really matters for their clients. It’s resiliency, disaster recovery, and availability of artifacts in their DevOps infrastructure. So I’ll talk some questions right now. Thanks for your time.