Home > Mobile >  PHP Get unsubscribe URL from email body
PHP Get unsubscribe URL from email body

Time:01-27

I have an email's HTML body. I need to parse just the unsubscribe link from that. So if at any point in the dom there is some kind of link, containing the word Unsubscribe, I would need to return the URL of that specific link. I tried different regex but I can't seem to find just the unsubscribe URL or sometimes at all.

$regexp = "<a\s[^>]*href=(\"??)([^\" >]*?)\\1[^>]*>(.*(?:unsubscribe).*)<\/a>";
preg_match_all("/$regexp/iU", $body, $matches);
var_dump($matches);

This does not work :/

Thanks

CodePudding user response:

I couldn't quickly find a solution to your problem with just regex so I hope you're cool with using a bit more PHP than regex.

Here's what I came up with:

$regexp = '<a\s (?:[^>]*?\s )?href=[\'|"]([^"]*)[\'|"]>(.*?)<\/a>';
preg_match_all("/$regexp/i", $body, $matches);

$urls = $matches[1];
$tagContents = $matches[2];

$unsubscribeUrls = [];
for ($i = 0; $i < count($tagContents); $i  ) {
    if(!isset($urls[$i]) || !isset($tagContents[$i])){
        continue;
    }
    if(stripos($tagContents[$i],  "unsubscribe") !== false){
        $unsubscribeUrls[] = $urls[$i];
    }
}
var_dump($unsubscribeUrls);

This will first match all a tags and split them up into URLs and tag contents. Then, using PHP, it will check if the tag's content contains "unsubscribe". If it does, it will be added to the $unsubscribeUrls variable. This variable should contain all the URLs that you want.

CodePudding user response:

You can use DOMXpath and check if the anchor contains a case insensitive match for unsubscribe and get the url using getAttribute to get the value for the href.

$data = <<<DATA
This is a link <a href="https://stackoverflow.com/">SO</a> and this is <a href="http://test.test">unsubscribe</a> and 
another and this is <a href="http://test.test">UnSubScribe</a>.
DATA;

$dom = new DomDocument();
$dom->loadHTML($data);
$xpath = new DOMXPath($dom);
$query = "//a[contains(translate(., 'UNSUBSCRIBE', 'unsubscribe'),'unsubscribe')]";
$anchors = $xpath->query($query);

foreach ($anchors as $a) {
    echo sprintf("%s: %s" . PHP_EOL,
        $a->nodeValue,
        $a->getAttribute("href")
    );
}

Output

unsubscribe: http://test.test
UnSubScribe: http://test.test

See a PHP demo.

  •  Tags:  
  • Related