HTTP Server Guide & How To Make One

idonotexistatall

5.00/5 (3 votes)

Apr 24, 2014

CPOL

9 min read

10106

How to make an HTTP Server and some documentation on server-side protocol

Introduction

The world of HTTP is a poorly documented place. The only good resource is RFC2616, and that is a difficult to understand monster to read. I hope to make those of you interested in making your own web server, handling HTTP requests, or those who just want to learn about how it works to come here and have a guide on how to do it.

Resources

Before I get started, I'm going to link you to some useful resources that I will mention along the way and I found useful.

RFC2616

http://www.w3.org/Protocols/rfc2616/rfc2616.html

This is the official specification on HTTP 1.1. It's a big, scary and hard to read but it should contain any information if you can't find anywhere else.

Some particularly useful parts of it are:

http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html [status codes]
http://www.w3.org/Protocols/rfc2616/rfc2616-sec9.html [HTTP methods]

Open Source Projects

Open source projects are a GREAT way to learn more. I learn a ton by examples, rather than just being told or reading it. The issue is many projects like HTTPD are massive, complex projects that are very difficult to understand and learn from. You can look at my server's code if you want - it's very simple and can be found here.

Introduction

The information in this guide is C# based, although it works in any language.

Formatting Rules

Aside from the given below, there is 1 other major rule. ALL new lines MUST be CRLF (Carriage Return, Line Feed), NOT just the normal LF (this means rather than /n, it must be /r/n).

Also, all client input SHOULD be in all caps for commands, so POST, GET, OPTIONS, etc., although you should be ready to handle lower case.

Headers

The very basis for HTTP is the header. The header is extremely important, as well as pretty simple. The format of a header is this:

Status line
Information: Value
Information: Value
...

Forming a Status Line

A status line is formed in this format:

HTTP-Version Status-Code Message

For most usage, you just want to use the HTTP version HTTP/1.1. This however means that you support HTTP/1.1 and are FULLY compliant with it. So keep that in mind, because that's what the browser or other program will expect. The status codes are listed and described in the place listed above in resources. The one you will use most however, is 200. 200 means all is going well and you will give the expected response. Last is the message. The message is just a description of what's going on causing the status code. As was said, 200 means all is going well- so its message is normally OK. Here's a standard response status line and you will be using it all the time.

HTTP/1.1 200 OK

So, to recap: Use HTTP/1.1 and make sure you are compliant according to specifications before you release the application or put it in a release environment. But for now, just leave it as is and don't worry about it. The status code tells what your server's status is and what kind of response the client should expect, and the message is just a description of the status.

The Rest of the Header

The rest of the header is just in the format of Item: Value. We will continue to cover this later. You should ALWAYS have the values of Server, Content-Type, and Date for ANYTHING. You should always tell the client as much as possible, however, these are important and rather easy to do.

An Example Header

HTTP/1.1 200 OK 
Server: ExampleServer
Connection: close
Content-Type: text/html

Client Commands

The client sends you commands to tell your server what to do. This is everything your server does- respond to these commands. Client commands are formatted like this:

[Command] [Resource] [HTTP Version] {body}

The command is what the server wants you to do. The main one is GET. GET is what tells your server to give it a webpage or file. The resource is the location or query of what it wants you to give it, and the HTTP version is the same kind as the one in the header. Commands should be backwards compatible (1.1 and 1.0) at the time of writing, so you can ignore this (although it would be best to handle later on). The body is more information on what it wants to tell you. For example, on a PUT command, this would be the file contents.

Responses

A response you give back to the client is in the format of [header] {body}. It's also very simple. The header is the header described above, and the body is either file or error message. There MUST be 2 new lines between the header and body!

Step-by-step

Now, for how to do it on a step-by-step basis. I'll include stuff for networking newbies too, but only to a degree. This is in C#, but you should be able to adapt it into your own language.

Get

You need to start up your web server by actually accepting connections, right? Otherwise it's not even a server! First things first though. You need to open a TCPConnection. I expect you understand IPs and Ports, so I won't explain this.

TcpListener listener = new TcpListener(IPAddress.Parse(LocalIP), Port);

listener.Start();

Then you put this in a listening loop.

Socket socket = listener.AcceptSocket();

Now, it will wait until it receives a connection, and you will have the socket of connection once it's gotten one. Now, it WILL NOT ACCEPT NEW CONNECTIONS UNTIL YOU FINISH HANDLING THAT ONE, so you MUST multithread it:

new Thread(new ThreadStart(handler.HandleRequest)).Start();

Now you get the data that is getting sent by doing this

netstream = new NetworkStream(socket);
reader = new StreamReader(netstream);
request = reader.ReadLine();

The networkstream is the stream of incoming data, and the streamreader lets you turn that stream into usable, readable text or bytes. The request is the client request as described above in client commands.

Now we can use that request and serve our first page! The below is self explanatory.

if (request.StartsWith("GET"))
{
    // Request handling
}

Inside of that if statement, you can handle your request for a file.

Now, I hope you understand network programming enough to implement your own version of this, but this is what I use to write to the stream and send the other person data:

private List<byte> buffer = new List<byte>(); </byte>

private void WriteToBuffer(String item)
{

   byte[] bytes = ASCIIEncoding.UTF8.GetBytes(item);
   foreach (Byte add in bytes)
   {
      buffer.Add(add);
   }
}

private void WriteBuffer()
{
   netstream.Write(buffer.ToArray(), 0, buffer.ToArray().Length);
   buffer.Clear();
}

Because all of the client command line is using spaces to separate data, the element after the first space is always the resource location, and you can get it using this:

request.Split(' ')[1]

Now you write your full response as was described earlier. This is an example of one I use to do this. As you can see, it's the full HTTP version, status code, message, and some data elements. I have the server name, a notification to close the connection after this, the datetime (DON'T do it the way I do- it's WRONG. I just haven't found the correct way to implement it yet - I will update this later. Thankfully, having it correct isn't vital), options list, and MIME type. Yours should have the HTTP version, status code, status message, and content type at a MINIMUM. Use text/html for normal HTML documents and text/plain for error messages until you understand mime types more.

WriteToBuffer("HTTP/1.1 200 OK\r\nServer:" + ServerName + "\r\nConnection: close\r\n" + "Date: " + DateTime.UtcNow + "\r\n" + fallow" + "Content-Type: text/html\r\n\r\n");

Next, you put the body- that is, the HTML file. You check if the file exists, and if it does, you write the file to the stream after the above header.

WriteToBuffer(File.ReadAllText("." + request.Split(' ')[1])));

So the file at that location exists, and you've read it, and added it to the string! Now you're ALMOST ready to go... But that's if the file DOES exist. But what if it doesn't? Then you want to do a 404 error as such:

WriteToBuffer("HTTP/1.1 404 OK\r\nServer:" + ServerName + "\r\nConnection: close\r\n" + "Date: " + DateTime.UtcNow + "\r\n" + fallow" + "Content-Type: text/plain\r\n\r\nERROR 404: File not found");

Finally, you write the buffer to the stream:

WriteBuffer();

And now you're ready! Put a basic HTML file at the directory of the executable /test.html and connect to the local IP and port you wrote such as 127.0.0.1:9080 and go to file /test.html (127.0.0.1:9080/test.html), and then try to go to blorg.html and see what happens- test.html will show up, and blorg will show the 404 error you wrote!

Note before Continuing

I put above how to send and receive data, and will not put it again. You should be able to understand the below commands based on the information above. This means I won't repeat how to send data, etc. If you don't know how to do something, look above or let me know to add it via comment.

Head

Head is a very simple command and requires very little explanation. You just do the same thing as the "get" command only without the actual file. This is meant to tell the client basic data about the file, link, etc. before they download or use it. This is often used for checking information about hyperlinks. It's useless to the browser unless you add file data in your headers, but you should definitely always have it.

Trace

Trace is basically an "echo" command. All you do is repeat back to the client what it sent you. I expect you understand what this means. No headers or anything- just a direct repeat. It's used by the client to check about changes to the command by intermediate servers. You should NOT change what was sent in ANY WAY.

Options

Options tells the browser what commands it's allowed to use on a particular file. It should always look like this:

"HTTP/1.1 200 OK\r\nAllow: [Command],[Command],[Command]..."

It has no body. Unless you need special file passwords or something, you should always have GET enabled. The return for most files would be:

"HTTP/1.1 200 OK\r\nAllow: OPTIONS,GET,TRACE,HEAD\r\n\r\n"

Another thing is Allow can be specially called using the options command, but you should always put the Allow in the header of ANY return. I always put the "allow" header data in my responses.

How to Handle Bad Input from the Client

Okay, if you have the above, you have all of the basic vital and standard HTTP methods implemented. But you need error handling too! You need to let the client know what's happening using status codes. Here are the codes and messages you should use in your response if something fails beyond the normal 404.

If somebody isn't allowed to use a command, access a file, etc., you need to give them an error code 403 with the message "Forbidden" or similar. 403 means the client does not have permission to access that file with that command.

If a client gives you an invalid command syntax, you need to give a 400 error with "Invalid-Syntax". This is what you put if the client has an invalid command input.

The next error code is 500. This is what you do if you have trouble reading a file or something similar- an internal server error. Not surprisingly, that's also the message to go with it.

The final major error code you'll need is 501. This is what happens if you haven't implemented a command (at least yet). So if your server cant handle "POST" or "CONNECT" yet, you'd do a 501- Method not implemented. You would also put this if it tries to use something not in the HTTP specifications, such as a custom command.

HTTP Server Guide & How To Make One

Introduction

Resources

RFC2616

Open Source Projects

Introduction

Formatting Rules

Headers

Forming a Status Line

The Rest of the Header

An Example Header

Client Commands

Responses

Step-by-step

Get

Note before Continuing

Head

Trace

Options

How to Handle Bad Input from the Client

More coming soon: Please check back later.