How to parse HTML using PHP

Last updated on October 4th, 2022 at 01:57 pm

HTML is merely a subset of XML, so if you are aware of how to parse a xml file using php then this script will be easy to understand.

That being said, here we are going to parse a Webpage having simple HTML tags like table, tr etc.,

Let us name the HTML page as Simple_Webpage.html and add the below code.

<html>
       <body>
             <table><tr>
                          <td><b>Country</b></td>
                          <td><b>Temp {F}</b></td>
                          <td><b>Current Status</b></td>
                     </tr>
                     <tr>
                          <td>United States</td>
                          <td>74</td>
                          <td>Sunny</td>
                     </tr>
                     <tr>
                          <td>United Kingdom</td>
                          <td>65</td>
                          <td>Sunny</td>
                     </tr>
                      
                     <tr>
                          <td>India</td>
                          <td>94</td>
                          <td>Sunny</td>
                     </tr>
 
             </table>   
       </body>
 </html>         

Now the next step os to parse the above HTML page using the below php code. As you can see we are creating a new DOMDocument to represent the entire HTML that we are loading and then loaded the file we created above in to a variable named html.

<?php
  // new dom object
  $dom = new DOMDocument();
 
  //load the html
  $html = $dom->loadHTMLFile('Simple_Webpage.html');
 
  //discard white space
  $dom->preserveWhiteSpace = false;
 
  //the table by its tag name
  $tables = $dom->getElementsByTagName('table');
 
  //get all rows from the table
  $rows = $tables->item(0)->getElementsByTagName('tr');
 
  // loop over the table rows
  foreach ($rows as $row)
  {
   // get each column by tag name
      $cols = $row->getElementsByTagName('td');
   // echo the values 
      echo $cols->item(0)->nodeValue.'<br />';
      echo $cols->item(1)->nodeValue.'<br />';
      echo $cols->item(2)->nodeValue.'<hr>';;
    }
 
?>

Demo

You might also be interested in parsing XML using PHP, if that is the case feel free to refer this tutorial

Simple XML Parsing Using PHP (with demo)

1 Comment

  1. Steve Urkel

    Great job done! Thanks for sharing .