On-the-Fly Content Type Detection in Go
Validating, sanitizing, and safely hosting user-generated content is a large and complex topic. Today we’ll look at just a single aspect of it — using magic bytes to sniff the content type of user uploaded files and reject ones that do not match our validation rules.
This post focuses on two things:
- How to sniff the content type without buffering the entire file in memory
- How to make the code ergonomic and reusable
Using http.DetectContentType
There is a http.DetectContentType
function in the standard library that does exactly what we need. Here’s its full description from the documentation:
DetectContentType implements the algorithm described at https://mimesniff.spec.whatwg.org/ to determine the Content-Type of the given data. It considers at most the first 512 bytes of data. DetectContentType always returns a valid MIME type: if it cannot determine a more specific one, it returns “application/octet-stream”.
Let’s see how we could use this in practice. Consider a typical file upload handler that copies files directly to S3:
func handleUpload(w http.ResponseWriter, r *http.Request) {
// Validate request headers
// ...
// Copy request body right into S3
uploader := s3manager.NewUploader(sess)
_, err := uploader.Upload(&s3manager.UploadInput{
Bucket: aws.String("my-bucket"),
Key: aws.String("filename.jpg"),
Body: r.Body,
})
if err != nil {
w.WriteHeader(500)
}
}
Let’s think about what it would look like if we manually used the http.DetectContentType
to only allow uploading images. Remember, we don’t want to buffer the entire file in memory:
// Read the first chunk of the request body
var first512 [512]byte
n, err := io.ReadFull(r.Body, first512[:])
if err != nil && !errors.Is(err, io.ErrUnexpectedEOF) && !errors.Is(err, io.EOF) {
w.WriteHeader(500)
return
}
// Detect and validate the content type
contentType := http.DetectContentType(first512[:n])
if !strings.HasPrefix(contentType, "image/") {
w.WriteHeader(400)
return
}
// Reassemble the request body back
reqBody := io.MultiReader(bytes.NewReader(first512[:n]), r.Body)
While this works, it has several drawbacks:
- The code is verbose and error-prone
- Error handling is scattered and difficult to maintain
- We’d need to repeat this pattern in every upload handler
Let’s encapsulate this logic into a reusable component that handles the complexity for us.
What Would We Want Instead?
Let’s first look at what we’d want to achieve. We’d like to have a custom reader wrapper — NewContentTypeReader
that automatically detects the content type as the body is being read and calls a user-provided callback. And user’s callback would do all necessary validations and optionally return an error. Our HTTP handler would look like this:
var ErrNotImage = errors.New("not an image")
func handleUpload(w http.ResponseWriter, r *http.Request) {
// Validate request headers
// ...
// A custom reader that detects and validates the content type
reqBody := NewContentTypeReader(r.Body, func(contentType string) error {
if !strings.HasPrefix(contentType, "image/") {
return ErrNotImage
}
return nil
})
// Copy request body right into S3
uploader := s3manager.NewUploader(sess)
_, err := uploader.Upload(&s3manager.UploadInput{
Bucket: aws.String("my-bucket"),
Key: aws.String("filename.jpg"),
Body: reqBody,
})
if errors.Is(err, ErrNotImage) {
// Caught the error from the custom reader
w.WriteHeader(400)
return
}
if err != nil {
w.WriteHeader(500)
}
}
In this code, any attempt to read from the reqBody
would fail with ErrNotImage
if the request body does not look like an image.
Implementing the Reader Wrapper
Now after we’ve seen how we want to use our wrapper, let’s look at how to implement it. We could use the same approach as above, with io.MultiReader
, though it would cause some complications in the cases when it’s impossible to read the first 512 bytes because of some non-fatal error (like i/o timeout).
Instead, our wrapper just proxies all Read
calls right to the original reader, but also accumulates an internal buffer until there’s enough data to call the http.DetectContentType
function.
type ctReader struct {
buf []byte
reader io.Reader
handler func(contentType string) error
}
// NewContentTypeReader returns a reader that sniffs the content type and passes it to the handler.
func NewContentTypeReader(r io.Reader, handler func(contentType string) error) io.Reader {
return &ctReader{
reader: r,
handler: handler,
}
}
func (r *ctReader) Read(p []byte) (n int, err error) {
n, err = r.reader.Read(p)
if r.handler != nil {
// Accumulate the buffer
r.buf = append(r.buf, p[:min(512, n)]...)
// Buffer is large enough or EOF reached
if len(r.buf) >= 512 || errors.Is(err, io.EOF) {
contentType := http.DetectContentType(r.buf)
if err2 := r.handler(contentType); err2 != nil {
err = err2 // replace the original error
}
// Make sure we don't call the handler again
r.handler = nil
r.buf = nil
}
}
return
}
Note: The code, as it’s written above, can buffer up to 1023 bytes. While it’s possible theoretically (511b read + 512b read), it’s very unlikely to happen in practice, where each read operation is usually several Kb long.
Conclusion
We’ve built a reusable solution for content-type detection during file uploads that:
- Validates files on-the-fly without buffering them entirely in memory
- Integrates with Go’s
io.Reader
interface and standard libraries - Works naturally with cloud storage services like Amazon S3
While magic byte detection is reliable for most common file types, remember that it’s just one layer of defense. For production systems, combine it with other security measures like file size limits, malware scanning, and proper access controls.