The Hacker Mind: Follow The Rabbit

Robert Vamosi

March 30, 2022

Hackers often make it look easy when in fact they started with no plan and were just following their curiosity, going down paths erratically just like a rabbit.

Researchers Nir Ohfeld and Sagi Tzadik join The Hacker Mind to talk about their presentation at Black Hat Europe 2021 on the ChaosDB vulnerability. It’s about how they started with a deliberately misconfigured version of CosmosDB and ended up with complete unrestricted access to the accounts and the databases of thousands of Microsoft Azure customers.

The Hacker Mind is available on all podcast platforms.

[Heads Up: This transcription was autogenerated, so there may be errors.]

‍

In hacking, what is it like to go down a rabbit hole, to keep uncovering more and more access to places you really shouldn’t have access to? I’m not necessarily talking about all the specifics--those things vary, of course -- I’m interested in that hacker mind curiosity that gets you from a simple query to a real and unforeseen result. We’ve seen hints of this in tv shows such as Mr. Robot.

Owner: May I help you with something?

Elliot Alderson: I like coming here, because your WiFi was fast. I mean, you're one of the few spots that has a fiber connection with gigabit speed. It's good. So good. It scratched that part of my mind. Part that doesn't allow good to exist without condition. So I started intercepting all the traffic on your network. That's when I noticed something strange. Then I decided to hack you. I know you run a website called Plato’s Boys.

Owner: Pardon me?

Elliot Alderson: You are using Tor networking to keep the server's anonymous. You made it really hard for anyone to see it. But I saw your onion routing protocol. It's not as anonymous as you think it is. Whoever's in control of the exit nodes is also in control of the traffic which makes me the one in control.

Let’s unpack that. Fast, GB wifi leads Elliot to discover a TOR network which leads him to the onion routing protocol. He’s somehow able to control of the Exit node which leads to child pornography. And he did that while having coffee at a local shop.

Mr. Robot’s a good show, for the most part, but what is it really like to start down one path and find that it leads to another path and then another, and then another. Like chasing a rabbit. You might start out with an innocent question, like what if someone misconfigured their cloud configuration could it lead to holding the keys to Microsoft’s Azure Surface Fabric framework? And if it did, what might that process look like? What decisions did you make to get there? In this episode I’ll talk with two security researchers who simply followed the rabbit. In a moment we’ll find out where it led them.

[Music]

Welcome to The Hacker Mind, an original podcast from ForAllSecure. I’m Robert Vamosi and in this episode I’m talking with two researchers who put on their black hats and discovered, quite by accident, complete unrestricted access to the accounts and databases of several thousand Microsoft Azure customers. It’s an interesting story, one they presented at Black Hat Europe in 2021, so I hope you’ll stick around.

[Music]

Vamosi: Cloud security is relatively new and as such it probably hasn’t gotten as much attention as it deserves. In Episode 28 I talked with Ophir Harpaz and Peleg Hadar about fuzzing the Hypervisor used in Microsoft’s Azure Cloud. It was a pretty serious vulnerability; a 9 out of 10 in security. I’m happy to report that Microsoft addressed both quickly.

Recently, though, another set of researchers found another vulnerability in Azure. And this ChaosDB vulnerability, that too was resolved quickly, but what happened in Azure it could also happen in other clouds. I mean, anyone can be guilty of having a bad configuration. So I wanted to learn more about that.

Ohfeld: My name is Nir Ohfeld. And I'm a security researcher at Wiz

Tzadik: My name is Sagi Tzadik and I'm also a security researcher at Wiz.

Tzadik : Wiz the cloud security company and we actually have a platform that helps big customers manage their cloud environments. It's supposed like all the cloud environments that you probably know of like AWS, GCP Azure overcab like everything that should have, you should need to support it does that and it's

Vamosi: Cloud is a hard concept to drive across a podcast with no visuals. Typically you see a diagram of a cloud as having boxes within boxes, so let’s imagine building. There’s an entryway, and there are various corridors; these corridors open to other corridors, and so the sections of the buildings are like boxes. To access these sections of the building, there have to be doors, too, and behind these doors are the various companies, the various cloud clients with all their various virtual machines that can be spun up or spun down.

When organizations move to the cloud, one of the first choices is who is going to manage your security. A simple analogy would be who is going to manage which doors to lock and which to keep open. There are things that can help you do this, or you can try and do it all yourself. That’s what Nil and Sagi started looking at, the configuration tools that are out there for managing and they wanted to see what could happen if someone set it up all wrong.

Ohfeld: So generally, from our point of view, as I such as we, like, split the cloud services to two things, you have self managed products and like the cloud infrastructure and managed products, so you can either set up a PostgresSQL on your EC2 and that will be self managed and you own the the security and everything around it. And that's your problem. And you can either choose to use managed products, which means that the cloud service provider will actually provision resources for you to use and if there is a vulnerability that needs to be patched in, in one of their products that they allocate it to you it's their responsibility to do that. And in order to be able to do such things, they actually have to have some like stuff installed, and that you're not absolutely aware of, in order to, to be able to monitor and manage your instance. So what we find interesting is researching managed services, because we believe that the cloud service provider treats them differently from self managed services. So there might be more play, more stuff to find out.

Vamosi: One of the big benefits of cloud services, as I mentioned before, is elasticity. If your work expands, you can spin up more instances, and if your work decreases, you can spin them down again. Since they are virtual, this is a huge cost savings over provisioning physical servers, and then having some of those servers sit dormant. Keeping track of all that makes having a managed service attractive.

Ohfeld: I think that most people rather use the managed stuff because it's easier. It's easier to use because you don't have to. That's one of the magic in the cloud. You don't have to like, you don't have to own security. You don't have to. Like if you're, if your workload gets big enough, it will actually expand automatically and you don't have to worry about the underlying resources and how many virtual machines you have in order to handle this huge payload.

[Music]

Vamosi: At Black Hat Europe 2021, Nil and Sagi presented a vulnerability in the CosmosDB, one of the cloud management services I mentioned. It turns out that CosmosDB is used by some of the world’s largest organizations to manage massive amounts of data in near real-time. So imagine the prize if you could somehow get access to all of that.

Ohfeld: Azure Cosmos DB is a managed database solution offered by Azure. This means that if you're an Azure customer, you can use CosmosDB to manage your data for applications. And it's a generally quite popular service among our customers. And that's one of the reasons that we chose to actually research it. Because we've seen many of our customers using it and we wanted to make sure that, like, if you can configure it wrong we will be able as a product to tell you that you configure it wrong, to get it correctly.

Vamosi: To give some perspective, CosmosDB powers critical business functions like processing millions of prescription transactions or managing customer order flows on e-commerce sites. So it’s super important.

Tzadik: If you find a vulnerability in CosmosDB, and since this is a managed service, we believe that it could affect a lot of customers and our customers using CosmosDB. So we found we felt like it's important for us to audit it. So actually, we have a problem that the cloud vulnerabilities do not usually get CVE. So that's one of the reasons we actually name our vulnerabilities. Because instead of referencing like, yeah, the CosmosDB vulnerability, you could actually, you would normally reference the CVE number, but since there are no CVEs for cloud vulnerabilities, we would rather use a catchy name. So, Chaos DB in that case. It wasn't scored because, as I said, there is no CV, but we did get a bounty from Azure. And to that time, it was the maximum bounty offered by Azure, which is $40,000. So this means that Azure probably understands how severe this vulnerability actually was. And they gave us a bounty accordingly.

Vamosi: So, reading about ChaosDB, you might think that they set out to find this, to prove that they could access thousands of Azure accounts using CosmosDB. Actually, that’s not how it happened. The had a simple goal -- to misconfigure CosmosDB -- and from there they let their curiosity lead the way. Which is what a Bad Actor might do.

Tzadik: Like Follow The Rabbit we set up a CosmosDB account, enter the interesting feature, and then we let me execute code but I don't have what privileges. What do I do okay, let's attempt to, like, elevate our privileges. And then we find that one thing led to another which made it like quite fun doing the research. Yeah, like a capture the flag competition. Yeah, we will. You'll continue to advance in some directions. Okay. Yeah. It's been really, really fun.

[Music]

Vamosi: Serendipity. One of the fun aspects of doing research is that when you start a project, you often don’t know the outcome ahead of time. It’s a happy set of accidents that leads one through the process. So Nir and Sagi were originally looking for misconfigurations that users can make when they try to set up their Cosmos DB instance on their own..

Ohfeld: So while we browse all the features in the user interface, trying to set up like the least secure Cosmos DB instance possible, we found that CosmosDB lets customers use a product called Jupyter Notebook. A Jupyter Notebook is a product that lets the customers replicate, represent the data in cool ways using code. And because we're already familiar with a Jupyter Notebook, we knew that it lets its customers execute arbitrary code. For us, as researchers, we can not look at the place that lets us execute arbitrary code and not dive in.

Vamosi: Right. If I, as an attacker, see a place where I can insert arbitrary code, why not start there? After the jailbreak that they achieved with the Jupyter Notebook least privilege escalation, they conducted a basic network recon to see what’s what.

Ohfeld: during the rec con to raise multiple questions like, What is this instance running? Is it in my account? Is it in someone else's account? Does this instance show the customer to other customers? And in order to answer some of that, we tried to elevate our privileges

Vamosi: There’s this concept of least privileges when sharing data. You might only want someone to see the data, not touch it. You might not even want them to see it. But if you can elevate your privileges then you can do a lot more. You can potentially change the data.

Ohfeld: Because privilege escalation bugs are pretty common and we believe that if we put enough resources in order to find one in the Jupyter Notebook implementation we will eventually find one. So like after… it turns out, it wasn't quite that difficult. After clicking on enough buttons in the user interface we eventually found a way to escalate our privileges.

Vamosi: That’s how I beta test things. I click randomly until something breaks. And I remember calling Symantec and reporting a serious bug in an early version of their password manager. When I told them the random keystrokes I’d done, they screamed back “Why would anyone strike those keys in that order.” Why indeed, the fact of the matter is that it opened the password manager without any password so anyone could see in clear text its content. Doesn’t matter why someone would strike those keys, it’s still a vulnerability, right?

Ohfeld: By default, the Jupyter Notebook runs in Python three, but which runs in a low privileged user named “Cosmos User.” But when we switched out a programming language from Python three to C sharp, we found out that that C sharp notebook actually runs us with privileges, not the local privilege Cosmos user. So after elevating a previous to it, we could conduct a more extensive recon and on the Jupyter Notebook environmentSo after conducting more thorough recon on the Jupyter Notebook environment, we found out that the Jupyter Notebook had some local firewall rules restricting network access. And when we reviewed the firewall rules, we found some familiar IP addresses, like the IP address of the Azure instance, metadata service, which lets the well for for any of you who are not familiar with the metadata service, it's a service in Azure. It's like a static IP for each Azure virtual machine, where the customer can access in order to retrieve metadata regarding the virtual machine instance. So the developers of the Jupyter Notebook service found it important that customers cannot access that instance's metadata service IP address.

Tzadik: And we couldn't see the rules because we didn't have root privileges when we started this, yet, so initially, we executed code as the unprivileged user. We tried accessing this IMDs URL,

Vamosi: IMDS is Instance Metadata Service, and this is what the developers of Jupyter notebook service blocked. So before they elevated their privileges they couldn’t see iIMDS. Why might that be the case? This service holds metadata about the currently running virtual machine instance, such as storage, network configuration and more. You simply send an HTTP request and retrieve unique information per Virtual Machine (VM). We issued a request, and discovered a couple of interesting things.

Tzadik: It's like CRL, and then an IP address that everyone knows, and this failed, but we didn't quite understand why. Either this virtual machine does not have access to the IMDS or there is nothing that prevents us from actually using it but we couldn't actually answer it because we didn't have root privileges and that's one of the reasons that we decided to actually elevate the privileges toit, or at least to try to elevate our privileges to it, because there was stuff that we didn't quite understand regarding our environment, but we didn't have the means to answer why. Because we were invisible to everything.

Vamosi: So that’s when they converted to C sharp, and elevated their privileges, the landscape changed.

Ohfeld: After elevating our privileges to it, we could inspect the firewall rules. And after viewing the firewall rules, we found that the developers of the service didn't want the users of the service to access certain IP addresses. One of the IP addresses was the instance metadata service. One other IP address was an IP address we weren't familiar with and which we later found out to be the IP address of a server called the wire server, which is another static IP address that can be found in every Azure virtual machine instance. And the firewall rules were also preventing the virtual machine from accessing any IP addresses in the 10.0.0.0/16 subnet, which is a pretty big subnet of IP addresses.

Vamosi: Subnets can be used to separate networks logically for different purposes such as business functions, and sometimes for security and access control purposes.

Tzadik: And these are the traditional like internal IP addresses. So this indicates that there is some internal network that the developers specifically try to forbid us from accessing. But since we had root privileges and these were local firewall rules that were configured with using IP tables, we could actually just flush them using IP tables.

Vamosi: IPTables is a way to configure the IP packet filter rules of the Linux kernel firewall. The filters are organized in different tables, which contain chains of rules for how to treat network traffic packets. To answer some of their immediate questions they issued the iptables.nvl command in order to view the local firewall rules configured on this machine. They wanted to see what network resources they can access. In the IP stables, they found a couple of interesting rules. One prevented them from access IMDS.

NOw, ordinarily you can query using http to retrieve metadata information regarding your virtual machine instance. For some reason the developers of the jupyter notebook feature do not actually want us to access it another interesting goal here is that is this tool that prevents us from accessing the 10.0.0.0/16 now at this point of this research we have no idea what this summit is all we know about it is that this is a subnet of internal ip addresses and that we shouldn't be able to access it

finally there is arule that prevents us from accessing this very specific ip address now same as before we have no idea what this ip address actually is all we know about it is that the developers do not want us to access it but the good thing is is that these are local firewall rules that were configured using ip tables and we are now running with what privileges this means that there is nothing that prevents us from actually removing these rules and accessing these network

By simply issuing iptables-f they removed these rules. In other words, they had flushed their iptables.

Tzadik: You would expect that this wouldn't work because it's not the best practice to enforce firewall rules locally. I mean, they could have just enforced these outside of our container of the machine, or like using a separate, separate resource in order to do that. But actually, when we flush these rules, we could somehow, like access these forbidden IP addresses. And so this shows that someone thought about restricting network access in that environment, but they chose to do it like using local firewalls, which is not the best practice and yeah, we could have, like taken advantage of it.

Vamosi: Understand that for an adversarial attacker, this is an opportunity. They’re looking for such weaknesses. By having root privileges and by flushing the local firewall rules this means that they can now access the Imds. Apparently the developers assumed no one would have root access.

Ohfeld: Like, enforcing the firewalls was locally assumed that an attacker can't gain root privileges when in reality as we mentioned before, a privilege escalation bug is pretty common.

Tzadik: Yeah, it's like you can like to use a one day CVE and now these days quite a lot. I mean, there were two that wee released this week. So even if we didn't find the bug that lets us run with the code was with beverages using the C sharp notebook. As Nir mentioned earlier, we could still look for a one day CVE and attempt to use it before Microsoft patches it. So this is not the best practice in my opinion.

Vamosi: We’re accustomed to talking about Zero Days, a vulnerability where the vendor has had zero days to mitigate. So a One day CVE is a recently disclosed CVE in which many people may not yet have patched or installed the workaround, but will shortly.

Tzadik: And this is only like the middle of the story. Well, the good part is still to come in

Vamosi: Easy to lose perspective. On TV it’s bam, bam, I have root on a system I’ve neer seem before. But how long does it take to find the vulnerability and exploit it? Weeks? Months?

Tzadik: Less than a week, like four or five days? Something like that. Wow. It's the day we started looking for Kosmos DB misconfigurations until August 9, where we actually found the vulnerability and reported it to Microsoft. It took us about a week of following the rabbit and like how hard work and yeah, it's like a four or five days, which it's more like eight to 10 days because we were walking online and because we really enjoyed it.

Tzadik: Like Follow The Rabbit we set up a cosmos DB account, enter the interesting feature, and then we let me execute code but I don't have what privileges. What do I do okay, let's attempt to, like, elevate our privileges. And then we find that one thing led to another which made it like quite fun doing the research.

[MUSIC]

VAMOSI: NIr and Sagi know there there are these firewall rules that are preventing them from seeing what’s up.

Tzadik: Okay, so as Nil just said, there were three IP addresses that said that the developers of the service don't want us to access but they chose like, not the best way to enforce it. So since we were running as root, we removed these firewall rules. So now these firewall rules do not exist anymore. We can actually access them. Yes. So we do like CRL. So one of the things that we found interesting is that the subscription ID that the MDS returns is not our subscription ID. Now, I don't know just to make it clear, a subscription is like your son essentially, an account ID like Who owns this machine. And since this is not a subscription ID that we're familiar with, this means that this is not our machine now, either.

Vamosi: They noticed another interesting thing here. Their Operating System type was set to windows. This was strange since they were obviously running linux commands in a linux terminal so why is it set to windows? After digging a bit more into their environment they determined that they were accessing their host machine metadata service and not the metadata service of their container. In other words, their host machine is actually a windows virtual machine hosting this container using hyperlink

Tzadik:: Either Microsoft provision it for us, which makes sense but then the second rule, the second IP table rules which forbids us from accessing internal network, like the 10.0.0/16 subnet which indicates that there are more IP addresses internally, which we could access made this factor a bit interesting, like maybe there are more machines like us that we can access this way.

Vamosi: This is that moment when a hacker walks an ethical edge. This is when someone could start to do some serious damage. But, of course, we’re talking to ethical hackers, who are only curious about what else is available to them so they can document it and inform Microsoft.

Tzadik: And, and according to them, yes, we also had an IP address in that subnet, which means that we could probably access them but as Nil also mentioned, there was like this IP address that we weren't familiar with, which, when searching on Google, we found out that this IP address belongs to something called the wire server.

Vamosi: You can think of the WireServer as the backend of these agents, used to supply any information the agent needs in order to function properly. So when a user installs an extension on the Azure virtual machine via the Azure portal, the wire server instructs the virtual machine's agent to install that extension supplying the appropriate configuration. It’s interesting that Microsoft offers almost no official documentation for WireServer. So the wire server ip address is used for communication between azure and the virtual machine agent. The agent contacts the web server in order to retrieve information regarding the machine extensions, or the machine secrets, among other things. Extensions are software applications that Azure manages, either first-party software like Azure’s log analytics agent, or third-party software that Azure supports.

Ohfeld: So after researching that, we wondered which extensions does the underlying virtual machine hosting our instance have? Because obviously, we're querying something that is not our own, and maybe by learning the extension and its configuration, it may help us to learn more about the hosting virtual machine and the environment we're running in.

Vamosi: So here they learned a few more things. For example, before the agent can retrieve any configurations, let alone extension configuration, it must first fetch something called the Goal State. What’s a Goal State? The Goal State is, among other things, a list of endpoints that the agents need to contact in order to fetch different configuration settings. What Sag-gi and Nil found is that they could download any azure virtual machine Goal State by executing a crl command. This got them all the configuration endpoints specific for their azure virtual machine.

Ohfeld: So after contacting the wire server and retrieving all the extensions installed on the virtual machine, what we found out is that although you were running inside of a Linux environment, like running bash commands, the underlying virtual machine was actually a Windows virtual machine, not a Linux one. So, this made it even more interesting. For each extension, the wire server supplies each configuration, and its configuration can be divided for two into two parts. There's the public static section, which contains general information about the settings of extensions, but for most sensitive information, like hard coded credentials, or pass or certificates, this kind of information can be found in the it's something called the protected settings of the extension. And because of the protected settings, all sensitive information is encrypted. Now, we wanted to know what's in the protected settings of the extension.

Tzadik Like it's like a protected setting of an extension that belongs to a Windows virtual machine while our own terminal is a Linux terminal. So there's a misunderstanding there. And at this point, we suspect that it's like an extension that belongs to the host machine like we are a virtual machine. We are a guest virtual machine, inside a hypervisor, and the extension that we got is of our hypervisor, so we thought that this was quite interesting. And since stuff in the protector setting is essentially secret, we found that maybe if we decrypt this particular setting we will actually have secrets that can be used in the service, generally. And we'll cover all of this in our Black Hat talk and in the blog, which you are welcome to check out if you want like the exact stuff that we did and in way more detail we were just giving you a brief essentially. And if you ever like a question you can ask us specifically if something is not clear because it's like a very high level overview and without any slides. It's very hard to pass the message without actually showing the protected settings in the public setting. That makes it a bit more difficult.

[Music]

Vamosi: Okay, remember our office building analogy, and the rabbit running through the corridors? Okay, every company using this cloud has a door a long a main corridor. But they’re still on the outside, or at least at this point they are in the lobby of this building, and maybe they’re starting to chase the rabbit up a staircase, encountering open doors and sometimes locked door. It’s an imperfect analogy, but you get the idea. We’re following a rabbit through an unknown building which houses hundreds of companies and all their virtual machines.

Ohfeld: We wanted to decrypt the protected settings in order to retrieve tickets that maybe we didn't know, like, as I mentioned, this is the follow the rabbit. So for each, we don't have a precise goal for everything we did like, so let's just wing it and maybe we’ll find something that will work. So if there is a path where we can maybe gain secrets that we weren't supposed to access, we thought we would try to do it. So we tried to decrypt the protected settings and in order to understand how the protected secrets are encrypted, we reviewed the code of the Azure virtual machine Linux agent, also known as the WA. The WA agent is an open source software hosted on GitHub. It's written in Python, which made it easy for us to reverse engineer it and to understand how the protected settings are supposed to be encrypted. So after reviewing the code, we found out the wire server has other endpoints besides the extensions endpoint and it also has a certificate endpoint. So the agent contacts the certificates endpoint in order to retrieve the encryption and decryption keys for the virtual machines extensions protected settings. So we built a very nice like CRL command.which is was really long. We sent the CRL command and we expected to get back the encryption keys. And after reviewing the code, the format in which the encryption keys and the decryption keys are supposed to come in is some kind of format called a PKCS 7 blob.

Vamosi: PKCS 7 is a standard syntax for storing signed and/or encrypted data. PKCS #7 is one of the family of standards called Public-Key Cryptography Standards (PKCS) created by RSA Laboratories.

Ohfeld What we actually got back from the web server after executing the CRL command was in a different format, in some kind of a material format called a Certificate bond package.

Tzadik: So yeah, we got that certificate bond package from the certificate endpoint. And initially, we tried to use the same open SSL commands to decrypt it the same way we decrypted the PKCS7 blob, but it didn't actually work. So we were like, Okay, how do we decrypt these certificate bond package format, so we tried googling it, and Google at that time had like, exactly zero answers that show you how to decrypt this format.

Vamosi: Another stone wall. Where do we go from here? So Neal and Sa-gidecided to reverse engineer the clients of the WireServer, the VM agents. They assumed that if anything knew how to decode this format, it would be these agents that rely on this information to function properly. These extensions were most likely installed on their HOST, the Windows-based VM, and not our private Linux container. This means that all responses from the WireServer are meant to be treated by the Windows agent, not the Linux one. And this was the breakthrough we needed to continue.

Ohfeld: And then we thought, okay, so we have the source code for the Linux agent. And this is something that we got back from the certificates endpoint of the wire server. So the Linux agent, the W agent, should be able to decrypt it. Let's let's look at the code and see how he does it. There is no reference to certificate one package inside the the source code of the Linux agent and then we will then we will a bit stumped and but then remembered that according to the MDS, and according to the extension information will take so far, we're actually retrieving information for a Windows virtual machine and not a Linux virtual machine. Remember earlier when I said that, even though we're using a Linux terminal, the MDS tells us that we're running inside the Windows virtual machine. And we found that well, so we thought that okay, maybe the Linux agent does not know how to decrypt the certificate, one package format. But maybe if we will set up an Azure virtual machine that has the windows agent installed, maybe it will know how to actually decrypt it. So we did just that. We set up an Azure virtual machine that had the windows agent installed, which is named I think, Windows Azure Guest Agent dot exe. If I recall correctly, you will. Yeah, now, we started examining it and since it is written in .Net, it is very easy to decompile it to something that resembles source code. So we did just that. And then we looked for the certificates bond package. And this was like the first time ever we actually found the reference to this bank. And since then, this was .Net. It was quite easy. To actually build something that utilizes the DLLs that are used in this project, in order to be able to decrypt this certificate bond package format.

Vamosi: It seems so easy in retrospect, doesn’t it? If you know where you going, you might be able to guess these things. But they didn't know where the Rabbit Hole would lead them. They had to rely on their own experiences to inform where to turn next. And they were generous about it. Remember the search results that turned up zero matches? They did something about that.

Ohfeld: So we wrote a smaller snip, C sharp snippet that you can also find on our blog that actually decrypt this format. So if you ever encounter this format in the future, you can reuse our work in order to do that.

Vamosi: Okay so the point here was to decrypt these certificates and see what’s there. Where does it get the decryption key? The answer is the Certificates endpoint. But to retrieve the certificates for the decryption, the agent first needs to take an extra precaution and supply a self-signed transport certificate that would be used to encrypt the certificates bundle.

Ohfeld: And so, now that we can actually decrypt the certificate one package format, we did it and when we set up and a Linux virtual machine and did that, and also a Windows virtual machine, when we set up a Windows virtual machine and decrypted the static the PKCS PKCS seven blob, we got back like two certificates at most, and this is what we expected to get when we did it with the certificate one package as well. But in reality, when we decrypted the certificate one package format, we actually got back 25 certificates instead of two.

Vamosi: Okay-- that’s certainly another wrinkle.

Tzadik: And some of them had like very interesting name and we had the certificate for us is not not *.notebook@cosmos.com or something like that. We have a wildcard certificate for a domain that is owned by Microsoft, and is actually used when you attempt to use the web console of the cosmos dB. So when you query like your database using the web console, it does an HTTP request going to a domain that that can be signed using this certificate, which at this point, we knew that we're onto something like we knew that we should probably don't have the private key for this for this certificate. And yeah, though, well, other couple interesting certificates, but that we had the private key.

Ohfeld: So at this point of our research, as Sagi mentioned, we felt that we were onto something, but then we were a bit stuck. We got a bunch of certificates. It was like getting a lot of keys, lots of keys, but you don't know what the door is. Like, what do I do with all of these private keys? What are they useful for? So we said okay, maybe we missed something during our research. And let's get back to the extension configuration we got back from the wire server. We also have the keys required for decrypting the protected settings. So let's decrypt them, maybe one of them will contain some kind of interesting information. So we decrypted all the protected settings and found pretty much nothing. We didn't get founded and anything interesting at that time.

[Music]

Vamosi: Another stone wall. Some interesting stuff, but no clear how it all fits together.

Tzadik: we hoped to find la credential and stuff that we can use in other interfaces, but we didn't find anything like this.

Vamosi: So they’ve decrypted the keys but the keys weren’t clear on how they could be used.

Ohfeld: Okay, we know Azure a bit more by this time, so let's review all the extensions again. Maybe we'll find some kind of an interesting extension. And we saw that there was an extension called Service Fabric node. Okay. So we had no idea what service actually is. Service Fabric actually is. But, the Service Fabric node extension has some things that seemed really pretty interesting. So the surface fabric extension had some pretty interesting information inside of its fabric settings. First, it mentioned some kind of certificate, which was a certificate which name was ServiceFabricWestUS one, which we retrieved from the wire server, which was one of the 25 certificates we got back from the wire server. And it mentioned some kind of URL. So we took that exact same URL, and put it in Google Chrome and browsed it. After accessing that URL an authentication prompt popped up requesting for a certification certificate, meaning this is one of the doors we were looking for.

Vamosi: An ah-ha moment.

Ohfeld: They wanted the certificate to let us in. So our best bet was just take a certificate we got back from the wire server that was mentioned in the public settings of the extension. And, and it worked.

Vamosi: If you’ve been paying close attention so far, you’d recall that our jailbreak included removing local firewall rules from the iptables that prevented us from accessing the 10.0.0.0/16 subnet, which we see in the manifest file above. This means, we could now access it freely. This also means we could access the local Service Fabric HttpGatewayEndpoint on port 19,080 from our Jupyter Notebook container, which, as the manifest file suggests, could be authenticated using fabric.westus1.cosmos.azure.com.

Ohfeld: But what you got back was like a huge XML manifest file containing a bunch of interesting information. It mentioned some kind of a port 19,080, which was an HTTPS port. And it also has had a few mentions to the certificate. The service fabric was just one certificate. So which made us deliver this certificate does have some kind of significance. And it also has had multiple references to the word service fabric. So at this point, we asked ourselves, what is service fabric?

Vamosi: Yeah, what is it?

Ohfeld: So when we started service fabric on Google, we found out that this is a container orchestration solution, which is actually what powers Azure. But if we're being honest, from this point, until like the end of the research we did as ah, yes, Kubernetes.

Vamosi: Ouch. Kubernetes is the orchestration used by Google, It is agnostic, but you realize that it’s Google deep down. So Surface Fabric is Azure’s variation on Kubernetes. OFficially, Service Fabric is an open source project at Microsoft and it powers core Azure infrastructure as well as other Microsoft services.

Ohfeld: This is like the Microsoft version of Kubernetes. Let's read it that way. So yeah, so and the manifest file also mentioned a few IP addresses in the 10.0.0.0/16 subnets. And like, if, if, and as a reminder, we're actually inside of the 10.0.0.0/16. subnet, according to the instance metadata service.

Tzadik: And we removed the firewall rules that would have prevented us from accessing the 10.0.0.0/16 subnet so we can actually access them.

Ohfeld: So after conducting a port scan on that subnet, we found out that some of the endpoints in that subnet are listening on port 19,080. After learning a bit about service fabric, we found out this is like the Service Fabric management port.

Vamosi: Oh, crap-- this is the main way that Microsoft services its Azure clients.

Ohfeld: And in order to communicate with that managed port management port, there is a standard command line tool offered by Microsoft called SFCtl, like the service fabric command line tool. And we try to connect to these manager ports over in that subnet in the 10.0.0.0/16the things and the authentication process, it obviously requires some kind of authentication and verification process requires a certificate. And until now there was just one certificate that had all the answers. We only use just one certificate and it always worked. So we used that certificate servicefabricWestUSone certificate and actually, it worked. We were able to authenticate to the cluster. And we tried because we didn't know anything about service fabric. So we tried to fuzz it. We try to put all the commands we like SF CTL minus minus help and try to do one of all the commands that the command line offers. And after running and after some trial and error, we issued the command SF CTL application list and it spit out like 500 Cosmos DB instances.

Vamosi: Okay, so this is like hitting a jackpot. 500 CosmosDB instances. There was still work to be done. For example, can they look inside these?

Ohfeld: And for each Cosmos DB instance, we got a bunch of encrypted authentication tokens. We got the primary encrypted form of the primary key for the database. And at this point, we knew that the primary key for the database is equivalent to the root passport in a traditional database, meaning it allows for full read and write access. But as I just said, all of these authentication tokens are encrypted. They're useless for us. No endpoint will accept any gifted form of an authentication token. But to this point, at our research, we only use a fraction of the certificates we got back from the wire server. We only like to use the one for the service fabric. So quite what we try to do is take each and every one of the encrypted authentication tokens and decrypt it with each and every one of the certificates we got back from the And now, knowing the answers, not knowing the answer, we feltl a little dumb knowing that the encryption certificate had a pretty digital name. It’s name was fabricsecrets.cosmos.azure.com.

(beat)

To our surprise, it was just one certificate to encrypt all cross-tenant sensitive application tokens. Not that it was a common ground for tenants that they've all been encrypted using just the one certificate. So it was really surprising for us. We believe that at this point, we believe that maybe this certificate will be even early to vacate our authentication tokens only for our tenants. So it doesn't mean but the one thing too, as I mentioned that according to the Microsoft documentation and various other things. Service Fabric is a core component inside Azure. And as I mentioned before, it's like this what actually makes some of the Azure services function. This so it's, uh, from our understanding, coding the documentation and in our research, authenticating to the Service Fabric is like authenticating to the service control plane. It's a very high value, a core component that we connected to over the internet.

Vamosi: So, over the internet, with all they’ve done, they were able to find the keys to the kingdom. But could they only see their tenant, or could they see other tenets?

Tzadik: But in practice, it did manage to decrypt like authentication tokens of other tenants of other customers and some of the credentials that we got after decrypting the secrets it was like the primary key for the database. So we could use it in order to authenticate to other customers' databases and like the SF CTL application list command also reveals the name of the database so we had the name and we had the password. We could just authenticate to it.

Vamosi: Again, this is that ethical moment when a bad actor could plunder these other tenents. They have the name and the password. But it gets worse. They found another way, one using the Juptyer Notebook, that could access these tenets as well. And in many ways this was even worse, at least from a forensics standpoint.

Tzadik: And there's also a token for the notebook service. So if we don't, if we don't want to use their primary key, we can also use the token for the notebook service and it would appear like something else in the logs. So we could use that. And there was a further authentication token that we managed to decrypt, which is like an authentication token that you could use for storage for Azure storage account. And this is the storage account that when you attempt to save the notebook that you have, like, you made the notebook that queries a database and do some calculations, you can save your progress to an Azure storage account. So we could actually have access to that as your storage account. And we could have modified your notebook. So the next time that you as a customer use that notebook, you will actually be the one querying your database and modifying the data, which would be insanely hard to track because you're the one doing the modifications.

Vamosi: Let’s think about that. You as a bad actor are log in as someone else and making modifications, but when the victim looks back at the logs, it would show that they made the modifications. This is not good.

Tzadik: So we had a couple of like, interesting attack vectors using these encrypted certificates, which was fun. And yeah, I think I'll continue. So we had like, Okay, we have this service fabric that we attempted to authenticate locally using the certificate that we got. But first, let's take this one step further. Let's see if there exists a service fabric instance that is accessible from the internet and will actually accept our certificate of the fabric USwest one certificate, because this seems to be a certificate that opens the most doors

Vamosi: Okay, so this is a significant escalation. Now they are trying to use the certificate that they found just by chasing the rabbit from here to here to here, they want to use that certificate from the outside of the virtual machines that they are in. They want to see if they can access these other accounts from the Internet, and in their case, from an IP address in Tel Avi, Israel.

Ohfeld: So what we actually did is that we scanned the entire internet for port 19,080 and see how many service public instances are there that belong to Microsoft, and are externally facing and will actually accept the certificates that we have. And it turned out that we actually managed to authenticate to over 100 service fabrics over the internet and I think this was like the first time an IP address from Israel Tel Aviv, actually authenticated to this service Publix over the internet, and it probably like, raised some alarms inside the Microsoft officers but we can tell and most of these service fabrics belong to the kosmos service. But some of them like to belong to other services. And we have the screenshot on our blog, so you're welcome to check it out. But like we didn't have time or resources to cover all of the service fabrics and what they are like, what is the function and what is the purpose? And so we left everything up and sent this report to Microsoft.

[Music]

So, they started out just trying to misconfure a popular service in Azure. They then they set up a Jupyter Notebook container on your Azure Cosmos DB, ran it in C# code to obtain root privileges, removed firewall rules set locally on the container in order to gain unrestricted network access, queried WireServer to obtain information about installed extensions, certificates and their corresponding private key, connected to the local Service Fabric, list all running applications, and obtain the Primary Key to other customers' databases, accessed Service Fabric instances of multiple regions over the internet, and then they responsibly reported all of this to Microsoft.

This is a pretty significant compromise, a big vulnerability in the Azure cloud.

So this is the part researchers tell me where everything can wrong. The vendor has their information but the researchers -- they don’t hear. Sometimes it’s because the vendor doesn’t have the infrastructure to handle outside reports of vulnerabilities. Sometimes it’s because the vendor does, and they simply need the time to investigate because there are dependencies. Or maybe it’s similar to something else they knew about it, which requires more time to get to the bottom it all. And there’s the case where the vendor responds, and responds fairly quickly. That was the case with Microsoft.

Tzadik; Actually Microsoft responded really well. They answered us in I think you'll remember the timeline. So yeah, I remember that time.

Ohfeld: We wanted them to respond really well. And we wanted to emphasize it because it's very, very impressive. Yeah. We highly appreciated it. Okay, it's not standard. Yeah. So and and then we sent the advisory to Microsoft, and let them know of this incident and less than 48 hours after the report, we noticed that the vulnerable feature, the Jupyter Notebook feature was disabled to all CosmosDB customers. And it's, it's funny to note that they said the feature is disabled. To this day. This feature is still disabled.

Tzadik: Yeah, we checked it right before the podcast and they said that this feature is still disabled

Ohfeld: We find that very funny. And four days after sending the report, Microsoft responded to us acknowledging our report. And, we also noticed that on the same day, they acknowledged in our report some of the credentials we obtained during your research was starting to be revoked, and things like certificates and certification tokens no longer worked. And after that, only five days after our initial report, Microsoft awarded us with the maximum bounty available for Azure $40,000. And this is like the fastest bounty we've ever received from any vendor. Only five days after a report was sent on a Friday. It's very impressive.

Vamosi: That is impressive, and I’ve noted how Microsoft tends to take vulnerability report seriously of late. Which is good for all of us.

Ohfeld: And11 days after our initial report, we did other things within the Microsoft security team. Well, they confirmed to us that there are several 1000s of customers that were affected by this vulnerability and 13 days after we bought a Microsoft sent an email notification to all undeniably affected customers. But life was affected by this vulnerability. But this is appalling. Something important to note is that Microsoft chose to send an email notification only to customers that had the Jupyter Notebook feature enabled during our research period, which like less than less than a week, but we feel that Microsoft shoulda emailed all potentially impacted customers. Because although this vulnerability is now patched we can never be sure that this vulnerability wasn't exploited by earlier engagement.

Vamosi: That’s a good point-- Microsoft should have contacted all CosmosDB customers. This shouldn’t, however, distract from the fact that they took it seriously and acted quickly.

Tzadik: Yeah, but Microsoft really did amazing work here. Like it took them only a couple of weeks to fix this severe vulnerability to all of their customers. Which is amazing. I mean, we work with many vendors. And this is like by far the best response we got we ever got from a vendor. Yeah, so really props to them. But what's interesting about the impact of this vulnerability, and this is something that Microsoft also covered in the email is that since we essentially were exposed to the customer secrets, and anyone else who attempted to exploit this vulnerability before we recorded it was essentially exposed to these customer secrets like the primary key for the database they instructed their customers to actually revoke these keys. So, I mean, Microsoft can revoke it to their customers on their own because this would break their obligations. So customers had to take manual measures in order to be mitigated from this vulnerability, which is like a quite unique thing in a managed service because usually, the security is managed by the service itself. I mean, this is part of the reason that you choose a managed service and specifically because, like the primary key for the database is like a long string of base64 characters you could use it in public. I mean, it's very hard to track in which applications you use it, you could use it like in your PHP code that is hosted in another cloud vendor, you could use it in an Azure function. There is a lot of places in which you could include this string, this primary key, so I am sure that mitigating this vulnerability from a customer side from like, I'm an I'm a cosmos DB customer, and they need to update my keys because they could have probably been leaked. I think this was very hard.

Vamosi: So that’s good advice, for any CosmosDB customers on Azure to change their keys. This research, in the hands of a bad actor, could have ended up very different.

Tzadik: Just to clarify, we didn't save any of this information. I think. I think this would be highly irresponsible. Right after we send this report to Microsoft three immediately deleted this information. So you don't have to worry about us. We are the good guys. Yeah, this is essentially the story of ChaosDB, like the somewhat higher level story of ChaosDB. And as I said previously, you can check out our blog and our Black Hat talk regarding this topic we covered when we cover it in much more detail.

Vamosi: I’d like to thank Sagi and Nir for sharing their Black Hat Europe talk, specifically their journey through CosmosDB, and shining a light on the wire server and Surface Fabric frameworks. It’s interesting to hear how the real process of hacking occurs, how one thing can lead to another in the hands of an experienced hacker. And it’s also good to see Microsoft responding quickly, as so many vendors still do not. And, as I said at the beginning, cloud security isn’t talked about as much as it should. It’s not --hey, guys let’s put everything in the cloud tomorrow--there needs to be some attention given to the configuration and certainly the private settings. Somehow I don’t think this will be the last time I talk about cloud security.

For the Hacker Mind, I remain with my feet firmly on the ground Robert Vamosi.

‍

Share this post

Fancy some inbox Mayhem?

Subscribe to our monthly newsletter for expert insights and news on DevSecOps topics, plus Mayhem tips and tutorials.

Thank you! Your submission has been received!

Oops! Something went wrong while submitting the form.

Add Mayhem to Your DevSecOps for Free.

Get a full-featured 30 day free trial.

The Hacker Mind: Follow The Rabbit

The Hacker Mind is available on all podcast platforms.

Introducing Mayhem’s Dynamic SBOM Generation and SCA Validation Feature

Crafting POCs for Fun and Profit using Mayhem

Mayhem Makers: Josh Thorngren, VP Marketing and Product

Fancy some inbox Mayhem?

Add Mayhem to Your DevSecOps for Free.

Complete API Security in 5 Minutes

Maximize Code Coverage in Minutes

Recent Blogs

Introducing Mayhem’s Dynamic SBOM Generation and SCA Validation Feature

Crafting POCs for Fun and Profit using Mayhem

Mayhem Makers: Josh Thorngren, VP Marketing and Product

Fancy some inbox Mayhem?

Add Mayhem to Your DevSecOps for Free.

Complete API Security in 5 Minutes

Maximize Code Coverage in Minutes