Computing: Website and Database Programming

Perl web programming: HTTP headers and CGI environment variables.

Accessing a resource (web page, image, audio file, video, ...) on the Internet is essentially a web client (web browser) sending a request for this resource to the web server (for example Apache), using the HTTP (or HTTPS) protocol (the "language" that the client and the server use when communicating with each other). The communication between client and server is done using so-called HTTP messages, the client sending a HTTP request, and the server answering with a HTTP response.

HTTP messages are made of the request resp. response line (e.g. GET /index.html HTTP/1.1), a message body (request information sent to the server or data returned to the client), and so-called HTTP headers. These are key-value pairs sent in HTTP requests and responses, providing essential information about the communication between the client and server. They include details such as content type, encoding, cache control, authentication, and more, helping manage the behavior of HTTP transactions.

A request header is a HTTP header that can be used in a HTTP request to provide information about the request context, so that the server can tailor the response. For example, the Accept, Accept-Language, Accept-Encoding headers indicate the allowed and preferred formats of the response. Other headers can be used to supply authentication credentials (e.g. Authorization), to control caching, or to give the server information about the user agent (identifying the client software), or the referrer (address of the previous webpage that linked to the requested resource).

A response header is a HTTP header that can be used in an HTTP response and that doesn't relate to the content of the message. Response headers, like Server (name of the server), Date (date/time when the message was sent), Expires (date/time after which the response is considered stale), are used to give a more detailed context of the response.

Both requests and responses include representation headers (or "representation metadata", often subdivided into general and entity headers), that, among others, describe how to interpret the data contained in the message. Examples are: Content-Type (the MIME type of the data; e.g. text/html; charset=utf-8), Content-Length (in bytes), Content-Language.

For a complete list of the HTTP header fields, have a look at the corresponding article in Wikipedia.

CGI environment variables are a series of key-value pairs that the web server sends to every CGI script that it runs. Some of these, as for example, WINDIR and PATH are operating system environment variables of the computer where the server is running on. Others, as for example, SERVER_SOFTWARE, SERVER_NAME, and SERVER_PORT concern the web server itself. Some of them concern the remote host (the computer where the client runs on); e.g. REMOTE_HOST, and REMOTE_ADDR. Others, as for example REQUEST_METHOD, and REQUEST_URI directly concern the request that the client made. In the case of messages which have attached information (message body), there are CONTENT_TYPE and CONTENT_LENGTH. In the case of HTTP requests, there are environment variables not concerning the request itself, but containing some other information that the client sent to the server; e.g. HTTP_ACCEPT_CHARSET, HTTP_ACCEPT_LANGUAGE, or HTTP_USER_AGENT. And finally, very important, in the case of a request with parameters passed to the called script by appending a "?" to the URL: QUERY_STRING. For a list of the environment variables that are server, or server-client communication related, you might want to have a look at Using Environment Variables at the O'Reilly website.

If you compare the list of HTTP headers with the list of CGI environment variables, you notice that all that might be important to the programmer, is available as environment variable. From the point of view Perl, the CGI environment variables are stored in a hash called %ENV, and thus may directly be accessed by a Perl script (and normally there isn't any need to directly access the HTTP messages). Perl distributions normally include a script called printenv.pl, with the following content:
    #!C:/Programs/Strawberry/win64/perl/bin/perl.exe
    use strict;
    use warnings;
    print "Content-type: text/plain; charset=iso-8859-1\n\n";
    foreach my $var (sort(keys(%ENV))) {
        my $val = $ENV{$var};
        $val =~ s|\n|\\n|g;
        $val =~ s|"|\\"|g;
        print "${var}=\"${val}\"\n";
    }

Notes:

  1. You have to change the first line of the script (shebang line) by indicating the actual path to your Perl interpreter.
  2. As this script not only displays information about your web server, but also about your operating system, you must never let this script in the cgi-bin directory of an Internet web server!

Place the script in the cgi-bin directory of Apache, and run the script by entering localhost/cgi-bin/printenv.pl in the address bar of your web browser. The screenshot shows the output of the script on my Windows 10, running the Apache 2 web server.

Perl CGI environment variables on Windows 10 with Apache web server

Among other things, you can see that the web server is an Apache 2.4.46 64-bit (with OpenSSL, PHP8, and mod_jk installed), running on localhost (127.0.0.1) on port 80. The client, Mozilla Firefox 135.0, connected from localhost and made a request for /cgi-bin/printenv.pl, awaiting an answer that may, among others, be of MIME type text/html, the preferred language being US-English.

Here is a very simple Perl script example (localhost_only.pl), that checks from where the client connected, and only continues execution if the "remote" host is localhost.
    use strict;
    use warnings;
    print "Content-type: text/plain; charset=iso-8859-1\n\n";
    my $host = $ENV{'REMOTE_ADDR'};
    if ($host eq '127.0.0.1') {
        print("You connected from localhost. That's ok to continue...\n");
    }
    else {
        print("Sorry, this script may only be run from localhost!\n");
    }

Note that even with http://wk-win10/cgi-bin/localhost_only.pl (where "wk-win10" is the name of the computer where Apache runs on), the script execution is refused; it's mandatory to "call" the script with the URL localhost/cgi-bin/localhost_only.pl

If we wanted to get the HTTP request headers rather than reading the CGI environment variables, we could use the http() method of CGI.pm, as in the following script:
    use strict;
    use warnings;
    use CGI;
    my $cgi = CGI->new;
    my %headers = map { $_ => $cgi->http($_) } $cgi->http();
    print $cgi->header('text/plain');
    print "HTTP request headers:\n";
    print "=====================\n\n";
    for my $header (keys %headers) {
        print "$header: $headers{$header}\n";
    }

I placed the script (that I named httpheaders.pl) in the cgi-bin directory of Apache, and ran it, entering https://wk-win10/cgi-bin/httpheaders.pl in the address bar of Google Chrome. The screenshot shows the output.

HTTP request headers when accessing an Apache web server with Google Chrome

Accessing the web server response.

"The libwww-perl collection is a set of Perl modules which provides a simple and consistent application programming interface (API) to the World-Wide Web. The main focus of the library is to provide classes and functions that allow you to write WWW clients. The library also contain modules that are of more general use and even classes that help you implement simple HTTP servers. Most modules in this library provide an object oriented API. The user agent, requests sent and responses received from the WWW server are all represented by objects. This makes a simple and powerful interface to these services. The interface is easy to extend and customize for your own needs." (text from the LWP description page at MetaCPAN).

The libwww-perl request object has the class name HTTP::Request. To note that the fact that the class name uses HTTP:: as a prefix only implies that we use the HTTP model of communication. It does not limit the kind of services we can try to pass this request to. For instance, we can send HTTP::Requests to ftp and gopher servers, as well as to the local file system.

We can create a HTTP::Request object using the following code:
    my $request = HTTP::Request->new(POST => 'http://search.cpan.org/search');
    $request->content_type('application/x-www-form-urlencoded');
    $request->content('query=libwww-perl&mode=dist');

To send the request to the webserver, we create our own user agent, using a LWP::UserAgent object:
    use LWP::UserAgent;
    my $ua = LWP::UserAgent->new;
    $ua->agent("Some_Custom_Name");

HTTP style response messages are implemented in the HTTP::Response class. HTTP response objects are returned by the request() method of the LWP::UserAgent object:
    my $response = $ua->request($request);

Here is the code of my script httpresp.pl, that prints the HTTP response headers when accessing my local website on Apache 2.:
    use LWP::UserAgent;
    print "Content-type: text/plain; charset=utf8\n\n";
    my $ua = LWP::UserAgent->new;
    $ua->agent("MyPerlScript/1.0");
    my $request = HTTP::Request->new(GET => 'http://127.0.0.1/index.html');
    $request->content_type('application/x-www-form-urlencoded');
    $request->content('query=libwww-perl&mode=dist');
    my $response = $ua->request($request);
    if ($response->is_success) {
        print $response->headers_as_string, "\n";
    }
    else {
        print $response->status_line, "\n";
    }

And here is the output of the script.

HTTP response headers when accessing an Apache web server

Note: If you wonder why I use the IP address here, the reason is that when using the URL localhost/index.html, I get the error message 400 URL must be absolute, and when specifying the absolute URL http://localhost/index.html, I get the error 500 Can't connect to localhost:80. I did not have this problem when connecting to an Internet server...

HTTP::Response can also be used to get the response body (i.e. the content of the web page that we access). Here is the code of my script httpresp2.pl to do so:
    use LWP::UserAgent;
    print "Content-type: text/plain; charset=utf8\n\n";
    my $ua = LWP::UserAgent->new;
    $ua->agent("MyPerlScript/1.0");
    my $request = HTTP::Request->new(GET => 'http://127.0.0.1/index.html');
    $request->content_type('application/x-www-form-urlencoded');
    $request->content('query=libwww-perl&mode=dist');
    my $response = $ua->request($request);
    if ($response->is_success) {
        print $response->content, "\n";
    }
    else {
        print $response->status_line, "\n";
    }

The output of the script is the HTML code of the web page.

HTTP response body (content) when accessing a web page

CGI environment variables and webform data access.

Most Perl scripts that work with data that have been entered into a web form use CGI.pm to read this data. CGI.pm is considered as legacy. The module is no longer part of all Perl distributions, and on several web sites, you can read sentences like "Do whatever you want, but, please, do not use CGI.pm". I'm not sure if such depreciation is really adequate. CGI.pm is not the fastest method, and can be a security risk under certain circumstances. But, Perl distributions like Strawberry Perl (at least the Windows release) continue to include it in their default installation. And I think that using the module to read form data (the "good part" of CGI.pm, as they call it on some website) is acceptable (and in a simple script more useful than using one of those complex web frameworks).

Anyway, there is a simple way to retrieve form data without CGI.pm (and without the need of any other special module): Accessing the HTTP request message directly. This is possible, when knowing the value of certain HTTP request headers, values that we get by reading the corresponding CGI environment variable. I will (one day) write a tutorial about how to implement this...


If you find this text helpful, please, support me and this website by signing my guestbook.