On-the-Fly Content Type Sniffing and Validation in Go

Validating, sanitizing, and safely hosting user-generated content is a large and complex topic. Today we’ll look at just a single aspect of it — using magic bytes to sniff the content type of user uploaded files and reject ones that do not match our validation rules.

This post focuses on two things:

How to sniff the content type without buffering an entire file in memory
How to make the code ergonomic and reusable

Using http.DetectContentType

There is a http.DetectContentType function in the standard library that does exactly what we need. Here’s its full description from the documentation:

DetectContentType implements the algorithm described at https://mimesniff.spec.whatwg.org/ to determine the Content-Type of the given data. It considers at most the first 512 bytes of data. DetectContentType always returns a valid MIME type: if it cannot determine a more specific one, it returns “application/octet-stream”.

Let’s see how we could use this in practice. Consider a typical file upload handler that copies files directly to S3:

func handleUpload(w http.ResponseWriter, r *http.Request) {
	// Validate request headers
	// ...

	// Copy request body right into S3
	uploader := s3manager.NewUploader(sess)

	_, err := uploader.Upload(&s3manager.UploadInput{
		Bucket: aws.String("my-bucket"),
		Key:    aws.String("filename.jpg"),
		Body:   r.Body,
	})
	if err != nil {
		w.WriteHeader(500)
	}
}

Let’s think about what it would look like if we manually used the http.DetectContentType to only allow uploading images. Remember, we don’t want to buffer the entire file in memory:

	// Read the first chunk of the request body
	var first512 [512]byte
	n, err := io.ReadFull(r.Body, first512[:])
	if err != nil && !errors.Is(err, io.ErrUnexpectedEOF) && !errors.Is(err, io.EOF) {
		w.WriteHeader(500)
		return
	}

	// Detect and validate the content type
	contentType := http.DetectContentType(first512[:n])
	if !strings.HasPrefix(contentType, "image/") {
		w.WriteHeader(400)
		return
	}

	// Reassemble the request body back
	reqBody := io.MultiReader(bytes.NewReader(first512[:n]), r.Body)

While this works, it has several drawbacks:

The code is verbose and error-prone
Error handling is scattered and difficult to maintain
We’d need to repeat this pattern in every upload handler

Let’s encapsulate this logic into a reusable component that handles the complexity for us.

What Would We Want Instead?

Let’s first look at what we’d want to achieve. We’d like to have a custom reader wrapper — NewContentTypeReader that automatically detects the content type as the body is being read and calls a user-provided callback. And user’s callback would do all necessary validations and optionally return an error. Our HTTP handler would look like this:

var ErrNotImage = errors.New("not an image")

func handleUpload(w http.ResponseWriter, r *http.Request) {
	// Validate request headers
	// ...

	// A custom reader that detects and validates the content type
	reqBody := NewContentTypeReader(r.Body, func(contentType string) error {
		if !strings.HasPrefix(contentType, "image/") {
			return ErrNotImage
		}
		return nil
	})

	// Copy request body right into S3
	uploader := s3manager.NewUploader(sess)

	_, err := uploader.Upload(&s3manager.UploadInput{
		Bucket: aws.String("my-bucket"),
		Key:    aws.String("filename.jpg"),
		Body:   reqBody,
	})
	if errors.Is(err, ErrNotImage) {
        // Caught the error from the custom reader
		w.WriteHeader(400)
		return
	}
	if err != nil {
		w.WriteHeader(500)
	}
}

In this code, any attempt to read from the reqBody would fail with ErrNotImage if the request body does not look like an image.

Implementing the Reader Wrapper

Now after we’ve seen how we want to use our wrapper, let’s look at how to implement it. We could use the same approach as above, with io.MultiReader, though it would cause some complications in the cases when it’s impossible to read the first 512 bytes because of some non-fatal error (like i/o timeout).

Instead, our wrapper just proxies all Read calls right to the original reader, but also accumulates an internal buffer until there’s enough data to call the http.DetectContentType function.

type ctReader struct {
	buf     []byte
	reader  io.Reader
	handler func(contentType string) error
}

// NewContentTypeReader returns a reader that sniffs the content type and passes it to the handler.
func NewContentTypeReader(r io.Reader, handler func(contentType string) error) io.Reader {
	return &ctReader{
		reader:  r,
		handler: handler,
	}
}

func (r *ctReader) Read(p []byte) (n int, err error) {
	n, err = r.reader.Read(p)

	if r.handler != nil {
		// Accumulate the buffer
		r.buf = append(r.buf, p[:min(512, n)]...)

		// Buffer is large enough or EOF reached
		if len(r.buf) >= 512 || errors.Is(err, io.EOF) {
			contentType := http.DetectContentType(r.buf)
			if err2 := r.handler(contentType); err2 != nil {
				err = err2 // replace the original error
			}

			// Make sure we don't call the handler again
			r.handler = nil
			r.buf = nil
		}
	}
	return
}

Note: The code, as it’s written above, can buffer up to 1023 bytes. While it’s possible theoretically (511b read + 512b read), it’s very unlikely to happen in practice, where each read operation is usually several Kb long.

Conclusion

We’ve built a reusable solution for content-type detection during file uploads that:

Validates files on-the-fly without buffering them entirely in memory
Integrates with Go’s io.Reader interface and standard libraries
Works naturally with cloud storage services like Amazon S3

While magic byte detection is reliable for most common file types, remember that it’s just one layer of defense. For production systems, combine it with other security measures like file size limits, malware scanning, and proper access controls.

On-the-Fly Content Type Sniffing and Validation in Go

Using http.DetectContentType

What Would We Want Instead?

Implementing the Reader Wrapper

Conclusion

From Multiple Atomics to Clean Progress Tracking

Real-Time Batching in Go