Pitfalls when disabling SSL verification with InsecureSkipVerify in Golang

tl;dr When disabling SSL verification using InsecureSkipVerify, make sure you're not loosing the default config of the default HTTP transport.

Thanks to Zak for troubleshooting this. :)


At engageSPARK we integrated with a bunch of vendors via HTTPS. Recently a vendor forgot to renew their SSL certificate, which prompted connections to fail. While they were scrambling to install a renewed one, we decided to disable SSL verification in the meantime, to be able to keep using their services.

For Golang, you disable SSL verification via an http.Transport. Stackoverflow has many answers explaining how to do that, here is one.

tr := &http.Transport{
    TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
}
client := &http.Client{Transport: tr}

So, that's what we did.

Soon after, during a spike of traffic in our staging system, that Go process ran out of file descriptors, and errors such as this one showed up:

Get https://someth.ing/: dial tcp someth.ing:443: socket: too many open files

This had not been a problem before, so instead of just increasing the file descriptors available, we investigated a bit more. As it turns out, the Stackoverflow answer above is flawed: It doesn't tell you that by creating a new Transport like that, you're loosing crucial default configuration.

That's because usually HTTP requests use the default Transport of the net/http package:

var DefaultTransport RoundTripper = &Transport{
    Proxy: ProxyFromEnvironment,
    DialContext: (&net.Dialer{
      Timeout:   30 * time.Second,
      KeepAlive: 30 * time.Second,
      DualStack: true,
    }).DialContext,
    MaxIdleConns:          100,
    IdleConnTimeout:       90 * time.Second,
    TLSHandshakeTimeout:   10 * time.Second,
    ExpectContinueTimeout: 1 * time.Second,
}

As you can see, that transport contains a bunch of default configuration: How long to keep TCP connections alive, how long to keep idle connections around and how many, this kind of stuff. If you're creating a Transport as shown at the beginning, then you get zero-values for the other fields, and not those lovely defaults.

In our case, we lost the default value for the Timeout of the Dialer: How long the OS should be waiting for a connection attempt to succeed, before it considers it failed. The value used in the DefaultTransport is 30 seconds, but if omitted the OS default kicks in, which on Ubuntu seems to be 1 minute. On our staging system, we were testing some error scenarios, one of which was to connect to a server that doesn't respond. Before, when connecting to a dead server, the connection attempt would have been considered failed after 30 seconds. With our changed Transport, we waited for twice as long, keeping the connection open and the file descriptor in use—until we ran out of the latter.

You can check for the timeout on Ubuntu like this:

$ sysctl net.ipv4.tcp_fin_timeout
net.ipv4.tcp_fin_timeout = 60

So, finally, how do you not loose the default configuration? Simple, copy the code for creating the DefaultTransport, but make sure you're using the code from your Go version (replicated above and here is the 1.9 version). Use the values from the default transport, as this answer to above SO question suggests. And then change the TLS config. (And don't forget to consider the other fields of the TLS config, either.) Do note that by using the defaults and then customizing the settings, you may be changing the defaults, too, for other transports, at least where pointers are used.

This is not very satisfactory: Instead of copying the entire transport and all subfields, I'd have preferred to keep the good defaults and only partially overwrite what I need, but neither net.http.DefaultTransport, nor the DialerContext, which is a function pointer, seem to be built for that. A possibility may be to deep clone the DefaultTransport, but if you want to change the value in the Dialer, we'd have to instantiate a new struct still.

Reproducing the problem

If you're interested in reproducing the problem, feel free to use the following script. Comments (1) and (2) indicate the two possibilities of overwriting the default Transport, or not.

package main

import (
    "bufio"
    "crypto/tls"
    "fmt"
    "io/ioutil"
    "net/http"
    "os"
    "sync"
    "time"
)

func main() {
    if len(os.Args) < 2 {
            fmt.Println("Please specify URL")
            os.Exit(1)
    }

    // -- (1) --
    tr := &http.Transport{
            TLSClientConfig: &tls.Config{InsecureSkipVerify: true},
    }
    client := &http.Client{Transport: tr, Timeout: time.Duration(2 * time.Second)}

    // OR
    // -- (2) --
    // client := &http.Client{Timeout: time.Duration(2 * time.Second)}

    var wg sync.WaitGroup

    for i := 0; i < 10; i++ {
            wg.Add(1)
            go func(j int) {
                    fmt.Println(j, "Starting request.")
                    resp, err := client.Get(os.Args[1])
                    if err != nil {
                            fmt.Println(err)
                    } else {
                            defer resp.Body.Close()

                            _, errRead := ioutil.ReadAll(resp.Body)
                            if errRead != nil {
                                    fmt.Println(errRead)
                            }

                    }
                    fmt.Println(j, "Done with request.")
                    wg.Done()
            }(i)
    }

    wg.Wait()
    fmt.Println("Waiting 30s")
    time.Sleep(30 * time.Second)

    fmt.Println("Done, please press RETURN to end the process")
    reader := bufio.NewReader(os.Stdin)
    fmt.Print("Enter text: ")
    text, _ := reader.ReadString('\n')
    fmt.Println(text)
}

Save it as insecure.go and run it like so, assuming you have Docker:

docker run -it --rm -v "${GOPATH}":/go -v "${PWD}":/go/src/play -w /go/src/play golang go build insecure.go  && ./insecure https://192.168.9.1/

The URL at the end should point to an unreachable host, so that after two seconds you get errors such as:

Get https://192.168.9.1/: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)

When the message indicates that the 30s are up, find out how many file descriptors your process has open, by using lsof:

lsof -p $(pidof insecure)

It should show ten lines like so:

insecure 22330 murat    7u  IPv4 4442172      0t0     TCP 10.11.14.102:52548->192.168.9.1:https (SYN_SENT)

Those are connection attempts, that are unacknowledged still (SYN_SENT).

Terminate the process by hitting return (the process is waiting for that, so you inspect it easily).

Now, switch to version (2) of the http.Client in the code, by commenting in the other client. Run the program again.

After 30 seconds, lsof should be showing no more open connections, as the default timeout of 30 seconds should have closed them. The file descriptiors will be released, too.

If you want to max out the file descriptors, first find out how many are available for your process. While the process is still running, look at its limits:

cat /proc/$(pidof insecure)/limits

You're looking for a line like so:

Max open files            1024                 65536                files

Then, increase the loop iterations to something higher, in this case 1200 would work.