6 minute read

I'm writing an authorisation library for AWS in Rust and Python. I want to parse requests sent to AWS public endpoints, say s3.amazonaws.com and figure out if I should forward them or not.

Instead of writing my own proxy, 1 I thought I'd integrate my library with Squid proxy.

Squid was initially released in 1996. Written in C++, it has survived the test of time, in that it is still widely used.

As an example AWS writes usecases using Squid as an outbound proxy. A similar case with GCP using Squid for egress control.

I want to use Squid for something similar.

I started my integration journey with external_acl_type.

Squid runs instances of a program of your choosing. You can define what parameters you can pass to this program.

Squid expects a response. An OK if the request should be allowed or ERR if the request should be denied.

Here's an example program that allows all requests:

while True:
    line = sys.stdin.readline().strip()
    if line:
        sys.stdout.write("OK\n")
    else:
        time.sleep( 1 )

Squid run the authoriser in the background. In your squid config you setup something like external_acl_type my_auth_helper %SRC my_authoriser. %SRC passes the source IP as an argument to my_authoriser. That works pretty well, but only when you're not dealing with HTTPS requests.

Squid cannot send HTTPS request headers to an external_acl_type 2, for reasons unknown. The "classical" way of dealing with this in Squid is ICAP.

ICAP stands for Internet Content Adaptation Protocol and has been defined since 2003 in RFC 3507. Two decades later it seems to still be in use. There is a crowd that supports anti-virus scanning on egress traffic. It's all the adaptation_ directives in Squid.

ICAP is aimed at being "a lightweight protocol for executing a 'remote procedure call' on HTTP messages".

It is nothing like that.

ICAP is symbiotic to HTTP. A wrapper. It has two main methods of operations 3. Request modification -REQMOD and response modification RESPMOD. In REQMOD, the request is forwarded to the ICAP server for a decision.

reqmod sequence diagram

In RESPMOD, the response from the origin webserver is forwarded to the ICAP server for a decision.

respmod sequence diagram

Squid has its own ICAP client.

In REQMOD, when squid receives an HTTP request from a client it would craft the following request to the ICAP server.

   REQMOD icap://icap-server.net/server?arg=87 ICAP/1.0
   Host: icap-server.net
   Encapsulated: req-hdr=0, null-body=76

   GET / HTTP/1.1
   Host: www.example.com
   Accept: text/html, text/plain

The first headers indicate byte offsets ala Content-Length in HTTP. Req-hdr tells us that the headers start from byte 0. Null-body tells us, there's no HTTP body that follows and there's 76 bytes of headers.

The core of the REQMOD ICAP request is the Encapsulated header, which is the HTTP header plus the HTTP body, if there is any. Requests terminate with 0\r\n\r\n

The ICAP server would send a response back

   ICAP/1.0 200 OK
   Server: ICAP-Server-Software/1.0
   Connection: close
   Encapsulated: req-hdr=0, null-body=197

   GET /modified-path HTTP/1.1
   Host: www.example.com
   Via: 1.0 icap-server.net (ICAP Example ReqMod Service 1.1)
   Accept: text/html, text/plain, image/gif
   Accept-Encoding: gzip, compress

In this example the ICAP server has modified the request, but it can also send a 403 Forbidden, or a 204 No Content response.

This behaviour seems to fit the bill. I'll setup an ICAP server that will inspect the HTTPS headers forwarded by Squid. Even better someone has written an ICAP parsing library. Huzzah!


let mut buf = Vec::new();
let mut icap_headers = [ICAP_EMPTY_HEADER; 16];
let mut icap_request = ICAPRequest::new(&mut icap_headers);

// We parse the ICAP request first
match icap_request.parse(&buf) {
    Ok(icaparse::Status::Complete(_)) => {
        // icap_request.encapsulated_sections contains our HTTP request 
        // send response   
        let response = format!("ICAP/1.0 204 No Content\r\n\r\n");
    }

You might notice that like in HTTP, in ICAP we could end up with Partial responses, but we'll pretend that this sort of thing can never happen. The RFC makes chunking of encapsulated bodies mandatory but encapsulated headers are not chunked. I only care about encapsulated headers. Interestingly enough the mandatory use of chunking, is what prohibited the authors from building ICAP on top of HTTP as an application-layer protocol.

The Squid config to forward all requests to the ICAP server.

icap_enable on
icap_service iservice reqmod_precache bypass=0 icap://127.0.0.1:1344

That seems straightforward. Which is why Squid never forwarded any HTTP request to my ICAP server. Baffling. Remember how I said ICAP had two main modes? Well I lied. I discovered that Squid sent a third type of method, OPTIONS.

Before squid forwards the first HTTP request, it asks the ICAP server for "options". Asking if it's a REQMOD or RESPMOD server, what's the date, what's the name of the ICAP server, that sort of thing. The c-icap-client, is very handy 4. It can craft an OPTIONS ICAP request.

Let's craft our OPTIONS response

const OPTIONS: &[u8] = r#"ICAP/1.0 200 OK
Methods: REQMOD
Service: Rust ICAP Server
Allow: 204
ISTag: RustICAPServer
Encapsulated: null-body=0

"#
.as_bytes();

Rust's raw string literals are neat like that. We don't need to mess around with \r\n.

It's inevitable that someone will write a fully fledged ICAP server in Rust. For now my implementation works pretty well.


1

I still might end up writing my own, because it could be simpler. There aren't many pluggable MiTM proxies out there.

2

Quite possibly. There's a chance I haven't unlocked the right arcane incantation that is the Squid configuration.

3

It's actually three as I found out later.

4

For example c-icap-client -i 127.0.0.1 -p 1344 -s service_name.