Fetching a Web Page From Your PHP Code

Nov 4, 2008 | Tags: PHP, HTTP | del.icio.us del.icio.us | digg Digg

To fetch a web page from your PHP application, you could use curl functions or simply open the page with fopen(). But these have some limitations. Your server needs to have PHP with curl enabled, or the PHP configuration should allow scripts to open URLs with fopen().

If that is not the case, you can still fetch a web page by opening a socket connection to the remote host and make HTTP request. It looks something like this:

Listing 1: fetch_page.php

  1. <?php
  2. function fetch_page($url)
  3. {
  4.     /* get hostname and path */
  5.     $host = parse_url($url, PHP_URL_HOST);
  6.     $path = parse_url($url, PHP_URL_PATH);
  7.  
  8.     if (empty($path)) {
  9.         $path = "/";
  10.     }
  11.    
  12.     /* Build HTTP 1.0 request header. Defined in RFC 1945 */
  13.     $headers = "GET $path HTTP/1.0\r\n"
  14.              . "User-Agent: myHttpTool/1.0\r\n\r\n";
  15.    
  16.     /* open socket connection to remote host on port 80 */
  17.     $fp = fsockopen($host, 80, $errno, $errmsg, 30);
  18.    
  19.     if (!$fp) {
  20.         /* ...some error handling... */
  21.         return false;
  22.     }
  23.    
  24.     /* send request headers */
  25.     fwrite($fp, $headers);
  26.    
  27.     /* read response */
  28.     while(!feof($fp)) {
  29.         $resp .= fgets($fp, 4096);
  30.     }
  31.     fclose($fp);
  32.    
  33.     /* separate header and body */
  34.     $neck = strpos($resp, "\r\n\r\n");
  35.     $head = substr($resp, 0, $neck);
  36.     $body = substr($resp, $neck+4);
  37.    
  38.     /* omit parsing response headers */
  39.    
  40.     /* return page contents */
  41.     return($body);
  42. }
  43. ?>

With the function above you should be able to fetch a page without any dependencies. However, it lacks many important features. You may want to add some looping for page redirects and add support for HTTP 1.1. For a more complex HTTP client, see my EasyWebFetch class.

EasyWebFetch is a simple class for web fetching from your application. This class is an alternative if your server doesn't have PHP with curl enabled, or the PHP configuration doesn't allow opening URLs with fopen(). This class fetch a web page by opening socket connection to remote host, so it has no dependencies and should work on any server configuration.

Features:

  • No dependencies
  • Support for HTTP 1.1
  • Support redirects
  • Support proxies
  • Support HEAD and GET methods. This might be useful for link checker applications

Related Articles

The Downloads

10 Comments

Abbas Khan on Aug 27, 2008:

Thanks Nasharuddin. Thats really a cool program of yours.
Looking into curl screwed my head ! LOL

Binny V A on Oct 23, 2008:

I wrote a similar program for this - try
http://www.bin-co.com/php/scripts/load/

blaaze on Dec 14, 2008:

(1/3)can anyone help me here actually im looking for a script that need to satisfy these things 1.it must be in php language 2.it must work like a robot, when initiated it must run automatically on server side only

Taree on Jul 5, 2009:

Hi, nice blog. I came from Hotscripts but the link is broken and the script is gone.

Nash on Jul 5, 2009:

I've written an advanced HTTP client using pure PHP (no curl). Fetching web pages is very easy:

<?php
include 'phpWebHacks.php';

$h = new phpWebHacks;
$page = $h->get('http://google.com');
?>
See HTTP Scripting Made Really Easy for more info about the tool.

Avijit on Feb 2, 2010:


Using the above code I can fetch many websites like yahoo,youtube etc. but can't fetch https://www.blogger.com/start
Can anyone say why?

article volcano on Feb 7, 2010:

Ah ha! Brilliant. Just what I was looking for. I started to write my own version which was similar to this. But this is better!

Chris on May 7, 2010:

Thanks. I made a quick tweak to the code so it doesn't strip out query strings:

function fetch_page($url) {
  /* get hostname and path */
  $host = parse_url($url, PHP_URL_HOST);
  $path = parse_url($url, PHP_URL_PATH);
  $query = parse_url($url, PHP_URL_QUERY);

  if (empty($path)) {
    $path = "/";
  }

  if (!empty($query)) {
    $path = $path."?$query";
  }

Anonymous on Oct 27, 2010:

Nice script. Do you know if php uses caching, or will this always re-download the page?

Nash on Oct 28, 2010:

This always re-download the page.

Leave a comment

Name (required)
Email (will not be published) (required)
Website

Characters left = 1000