Extract data from HTML to XML, import to Google Sheets

Posted By: Ian on Jan 30, 2015 in PHP, Snippets, XML

Question

I use Google sheets as a daily menu system for residents in a home and it works great but now the good old EU Food Information for Consumers Regulation (1169/2011) means i have to be able to show whats in each of our dishes.

Answer

The “Top 14” allergens according to them are

  1. Celery 
  2. Gluten 
  3. Crustaceans 
  4. Eggs 
  5. Fish 
  6. Lupin 
  7. Milk 
  8. Mollusc 
  9. Mustard 
  10. Nuts 
  11. Peanuts 
  12. Sesame seeds 
  13. Soya 
  14. Sulphur Dioxide 
  15. Banana <– Extra one i added

Make sure all your recipes (.html format) include Title, Category and Allergen information.


French Onion Soup  <– Title

Soups  <– Category

Allergy advice

Contains:- Celery  <– Allergens

Ingredients

  • Butter
  • 5lb. Spanish onions (sliced)
  • 4 litres beef stock (knor)
  • Brown sugar (Sprinkle)
  • Splash of sherry
  • 2 x tbsp thyme

Method

  1. Fry onions in butter until golden brown (10-15 minutes)
  2. Add sugar and sherry
  3. Add thyme and stock and bring to boil
  4. Reduce and simmer for 45 minutes or until cooked
  5. Adjust seasoning and consistency if necessary

Now we only need to get the information in blue from the recipe above before automatically inserting it to our spreadsheet.
See the following PHP script which takes the information we need and inserts it into an XML file.

<?php
$xmlFileName  = "recipes.xml";
//  Creates a new SimpleXMLElement object 
$xml          = new SimpleXMLElement('<xml/>');
//  Path to recipe folder
$recipeFolder = "http://mental-man.com/recipes/";
//  Find all files that end with .html and use each one as recipe
foreach (glob('*.html') as $recipeName) {
    //  Set allergen variable with empty value 
    $allergenString = "";
    $doc            = new DOMDocument();
    @$doc->loadHTMLFile($recipeFolder . $recipeName);
    $xpath                = new DOMXPath($doc);
    $category             = $xpath->query("//div[@id = 'category']")->item(0)->nodeValue;
    $title                = $xpath->query("//title")->item(0)->nodeValue;
    $allergen             = $xpath->query("//h2[@id = 'allergens']")->item(0)->nodeValue;
    $capsInsensitiveNode  = strtolower($allergen);
    $allergenArray        = array(
        "Celery",
        "Gluten",
        "Crustaceans",
        "Eggs",
        "Fish",
        "Lupin",
        "Milk",
        "Mollusc",
        "Mustard",
        "Nuts",
        "Peanuts",
        "Sesame seeds",
        "Soya",
        "Sulphur Dioxide",
        "banana"
    );
    $capsInsensitiveArray = array_map('strtolower', $allergenArray);
    $stringToArrayNode    = explode(',', $capsInsensitiveNode);
    for ($i = 0; $i < count($capsInsensitiveArray); $i++) {
        if (in_array($capsInsensitiveArray[$i], $stringToArrayNode)) {
            $allergenString .= "1,";
        } //  in_array($capsInsensitiveArray[$i], $stringToArrayNode)
        else {
            $allergenString .= "0,";
        }
    } //  $i = 0; $i < count($capsInsensitiveArray); $i++
    $dropLastCharAllegens = substr($allergenString, 0, strlen($allergenString) - 1);
    $recipe               = $xml->addChild('recipe');
    $recipe->addChild('filename', $recipeFolder . $recipeName);
    $recipe->addChild('category', $category);
    $recipe->addChild('title', $title);
    $recipe->addChild('allergens', $dropLastCharAllegens);
} //  glob('*.html') as $recipeName
//  Show xml file
Header('Content-type: text/xml');
print($xml->asXML());
//  Save xml file
$xml->asXML($xmlFileName);
//  Open and format xml file
$simplexml               = simplexml_load_file($xmlFileName);
$dom                     = new DOMDocument('1.0');
$dom->preserveWhiteSpace = false;
$dom->formatOutput       = true;
$dom->loadXML($simplexml->asXML());
$xml = new SimpleXMLElement($dom->saveXML());
$xml->saveXML($xmlFileName);
?> 


Example XML file.


<?xml version=”1.0″?>
<xml>
<recipe>
<filename>http://mental-man.com/recipes/french-onion-soup.html</filename><category>Soups</category>
<title>French Onion Soup</title>
<allergens>1,0,0,0,0,0,0,0,0,0,0,0,0,0,0</allergens>
</recipe>
</xml>


Now using IMPORTXML we can import the XML file into our spreadsheet and use it as we require.

=IMPORTXML("http://mental-man.com/recipes/recipes.xml", "//recipe")

Job done!

AnonymousIan.J.Gough

Did this help you? Please Let me know by commenting below even if it’s just to say “Thanks”

leave a comment

About This Site

Dreams are built from lines of code well it's true!
If you can dream it you can usually code it.

I have learned so much from other people on the Internet and this site is for me to give back some of what i have learned and hope people can now learn from me.
Have fun and surf safely,
Ian.J.Gough

protected by copyscape duplicate content check

Categories


Powered by Banner Bar 125