Primitive Defence against Good Old cURL Scanning Attack
One of the founding weaknesses in the HTTP protocol was the binding of file systems directly to URL paths. At the time of Gopher and the evolvement of web standards, somebody made a shortcut and found it easy to just publish the folder online with paths and everything. One can only recall the famous parody song “typing the pathnames with my fingers” which was played in some Apple conference during the times of Windows 98. This concept of pathnames and its adoption to the HTTP protocol and hence to the URL scheming system, in essence, introduced the odd and strange cURL attack vector.
This cURL attack is still present in many content management and other web systems, like the famous SQL injections. After all, also the most widespread diseases are paradoxically also the most easily cured ones (or are they?). The root cause is the fact that URL maps directly to a folder in the file system, and hence as the file system permissions are typically very primitive, the URL potentially provides access to the whole folder, if no additional security mechanisms are introduced. And even if there are some session based mechanisms to prevent random HTTP requestors to access the whole folder, already logged in users usually are free to request whatever they find in those folders. Some legacy legal entities have hence tried to criminalise changing of the URL parameters in the first place! However how practical that would be from the law and enforcement or basic human rights perspective remains a question.
Well, back to the cURL attack. As we do have the URL mapped directly to the file system, and in most of the content management systems there are various items, or files, which are stored and managed there. And as we have learnt, in a number-based way to identifying things, if something does not have a number, it does not exist, some people think. Or if things cannot be ordered, identified and numbered, there is some dangerous disarray present, which needs to be sorted out. This conception of mind leads to the situation where most systems do have neatly ordered numbers which correlate to what they have. For example, users could be numbered 1, 2, 3 and 4, etc. Or documents. Or whatever there are. So far so good, but then if you have the URL mapped directly to the file system, and things ordered there systematically, then you, in essence, do have access to all of them, which might not be as planned in every case.
The cool command-line application cURL comes here handy and provides an easy way to produce HTTP requests to request a wide range of such ordered files. And like many people, even non-techies know that many content management systems are trying to defend against it by various means. Maybe most common of them is the session-based HTTP request verification, but there are also more exotic solutions seen in this world. One of the oddest solutions so far was the introduction of nonlinear sequences in order to prevent too curious cURL users to scan quickly out the whole file system where everything was neatly ordered otherwise in linear sequences.
Figure 1: Nonlinear Sequence Trendline
When doing some sampling of the sequences, one can easily find a trend line by using common spreadsheet applications (Figure 1). However to solve the equation itself becomes more time-consuming, while only is a matter of effort. Seriously, this is no defence against anyone who really wants to get those HTTP requests responding successfully, but oddly enough, it practically scares off average incompetent users who did not do their math in the school! With a little bit of creativity, you can even get the checksums in neat order, using the logarithmic scale (Figure 2).
Figure 2: Sequence on Logarithmic Scale
So, this is no defence at all, but good enough for its purposes. One can only guess why this specific web system did not introduce a proper hash, random and session-based mechanism to prevent cURL scanning completely. While cURL cannot do nonlinear scanning, it is a trivial task to do in other applications. Most probably the reason for this strange solution to the oddest problem was the lack of time, interest and the famous business needs. It is good because “it works!”, while at the same it does not work at all. It could also be possible that this was a honeypot system, setup vulnerable in order to collect information and intelligence of bots interested in crawling such data.
Whatever the case was, it was a good demonstration that the inherent vulnerabilities of a protocol should not be left to the random developers to cope with, but solved in the protocol instead.
Kristo Helasvuo, Guest Author.