Simple HTML Dom Parser: How to insert to elements - html-parsing

I am trying to insert (append) into an element ... "body" specifically.
I am able to do this this way at the moment:
$var = "some JS stuff";
$e = $htmlDOM->find("body", 0);
$e->outertext = '<body>' . $e->innertext . $var . '</body>';
My issue is that this fixes <body> but the actual html might have js or id etc attached to <body> and I will like to maintain that if it is the case.
The docs don't seem to show a method for this.
Is it possible?

After looking at the source code, I found that the way to go about it is to use
$var = "some JS stuff";
$e = $htmlDOM->find("body", 0);
$e->outertext = $e->makeup() . $e->innertext . $var . '</body>';
That is, the undocumented makeup() function will build the tag and any associated text/code.

Related

Passing URL parameter to link on page

I am trying to grab a parameter from a webpage and insert it into a URL link on that same page but am having problems with the syntax.
So, for example, the webpage is www.website.com?src=mm
Currently the code on the page that does not pull in the parameter is
<?php echo "<A HREF='http://www.website2.com?offer=AAt&sub1=422'><B>Click Here</B></A><BR>" ?>
I would like to include that "mm" parameter at the end of the URL so the final URL is:
http://www.website2.com?offer=AA&sub1=422&sub2=mm
I tried the following but does not work:
<?php echo "<B>Click Here</B><BR>" ?>
Any ideas on how to get this to work? Thanks
Your code doesn't even compile:
Parse error: syntax error, unexpected 'http' (T_STRING), expecting ',' or ';' in /var/www/html/ImagePT/test.php on line 1
it has to be
<?php echo '<B>Click Here</B><BR>'; ?>
but since I'm just in the mood to give you some further advice:
You don't have to write HTML in uppercase, it's rather unusual (not impossible, but you don't see it very often) - then this script is horrible, when the $_GET['src'] variable is undefinied, therefore I'd check if it is set and then modifiy the URL accordingly. So my advice would be to use the following:
<?php
if(isset($_GET['src']))
{
echo '<b>Click Here</b></br>';
}
else
{
echo '<b>Click Here</b></br>';
}
?>

Replacing caption between <a> with attributes

I'm trying to preg_replace a link caption as below. Can't find an example tho where replacing would consider tag attributes, not just clean tags
Basically, this
Database Title
needs to become this
My Own Title
Help appreciated
you can a use a regular expression along with back references
$html = 'Database Title;
$html = preg_replace("/(<.+>).+(<.+>)/", "$1My Own Title$2", $html);
echo $html;
http://www.php.net/manual/en/regexp.reference.back-references.php

How to get the current view being rendered inside ZF2

Is there a way to get the current view being rendered inside zend framework 2?
I believe this should be possible with the event system but I can't seem to make it work.
The reason I want to get this information is so I can automatically include a .js file with the same name, this would save me time having to specify this rule each time i'm inside a action.
Many thanks,
Tom
I'm not quite sure what you mean by rendering the current view inside ZF2, but here's how you can add a js file named after the action automatically. Just put this in your controller:
public function onDispatch(MvcEvent $mvcEvent)
{
$renderer = $this->serviceLocator->get('Zend\View\Renderer\PhpRenderer');
$actionName = $mvcEvent->getRouteMatch()->getParam('action');
$jsFile = $actionName . '.js';
$baseUrl = $mvcEvent->getRouter()->getBaseUrl();
$renderer->headScript()->appendFile($baseUrl . '/js/' . $jsFile);
return parent::onDispatch($mvcEvent);
}
You may need to adjust the code for your js file location and name of course. The onDispatch method is called automatically before the action.
Thank You Wunibald,
Your example worked perfectly, I have modified it below to be attached to an event so that it applies to every controller/module. To do this I have included it into the onBootstrap function in my Application module.
$events = StaticEventManager::getInstance();
$events->attach('Zend\\Mvc\\Application', 'dispatch', function(\Zend\EventManager\Event $event)
{
$baseUrl = $event->getRouter()->getBaseUrl();
$renderer = $event->getApplication()->getServiceManager()->get('Zend\View\Renderer\PhpRenderer');
$action = $event->getRouteMatch()->getParam('action');
$controller = $event->getRouteMatch()->getParam('controller');
if (strlen($controller) > 0)
{
list($module, $_null, $controller) = explode('\\', $controller);
$renderer->headScript()->appendFile($baseUrl . '/module/' . $module . '/view/' . strtolower($module) . '/' . strtolower($controller) . '/' . strtolower($action) . '.js');
$renderer->headScript()->appendFile($baseUrl . '/module/' . $module . '/view/' . strtolower($module) . '/' . strtolower($module) . '.js');
}
});
Once again thank you for pointing me in the right direction.

nicedit - is it safe and is it affected by the site's css?

I'm considering using nicedit (http://nicedit.com/) for my site.
I assume that nicedit simply creates simple html using the buttons, and that html gets sent when the user saves it.
Is it recommended? Is someone still working on it?
Assuming I'm later displaying this HTML in my site somewhere, isn't it dangerous due to the user being able to plant malicious javascript? If not, how does nicedit prevents this?
Also, when I display this HTML later, will it be affected by my css? If so, how can I prevent this?
Thanks.
This is what I use it works like a charm for cleaning out the content of the nicedit instance before chucking into the database
function cleanFromEditor($text) {
//try to decode html before we clean it then we submit to database
$text = stripslashes(html_entity_decode($text));
//clean out tags that we don't want in the text
$text = strip_tags($text,'<p><div><strong><em><ul><ol><li><u><blockquote><br><sub><img><a><h1><h2><h3><span><b>');
//conversion elements
$conversion = array(
'<br>'=>'<br />',
'<b>'=>'<strong>',
'</b>'=>'</strong>',
'<i>'=>'<em>',
'</i>'=>'</em>'
);
//clean up the old html with new
foreach($conversion as $old=>$new){
$text = str_replace($old, $new, $text);
}
return htmlentities(mysql_real_escape_string($text));
}
It doesn't appear to be maintained anymore. But I have used it for purposes where I needed just a simple/lightweight WYSIWYG editor. If you are looking for something that gets constant core updates or additional features I wouldn't count on it. I finally broke down and wrote a lot of my own features like tables and YouTube videos.
Yes, a hacker could use it to post an client and/or server exploit on your site. But this is a threat you can face with any editor. You need to filter the code for two methods.
You need to prevent SQL injection by sanitizing your post variables. I always put this at the beginning of my scripts to clean them and call them with $input['whateveryouarepassing']instead of $_POST['whateveryouarepassing']. Edit the $mysqli->real_escape_string() parts to work with your database object. Use MySQLi or PDO with prepared statements to help harden the attack.
$input = array();
if(isset($_POST)) {
foreach ($_POST as $key => $value) {
if (#get_magic_quotes_gpc()) {
$key = stripslashes($key);
$value = stripslashes($value);
}
$key = $mysqli->real_escape_string($key);
$value = $mysqli->real_escape_string($value);
$input[$key] = $value;
}
}
Then I like to clean it with this function I put together over the years with various methods of cleaning out bad code. Use HTML Purifier instead if you can set it up. If not, here is this bad boy. Call it with cleanHTML($input['whateveryouarepassing']);.
function cleanHTML($string) {
$string = preg_replace('#(&\#*\w+)[\x00-\x20]+;#u', "$1;", $string);
$string = preg_replace('#(&\#x*)([0-9A-F]+);*#iu', "$1$2;", $string);
$string = html_entity_decode($string, ENT_COMPAT, "UTF-8");
$string = preg_replace('#(<[^>]+[\x00-\x20\"\'\/])(on|xmlns)[^>]*>#iUu', "$1>", $string);
$string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*j[\x00-\x20]*a[\x00-\x20]*v[\x00-\x20]*a[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iUu', '$1=$2nojavascript...', $string);
$string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*v[\x00-\x20]*b[\x00-\x20]*s[\x00-\x20]*c[\x00-\x20]*r[\x00-\x20]*i[\x00-\x20]*p[\x00-\x20]*t[\x00-\x20]*:#iUu', '$1=$2novbscript...', $string);
$string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*-moz-binding[\x00-\x20]*:#Uu', '$1=$2nomozbinding...', $string);
$string = preg_replace('#([a-z]*)[\x00-\x20\/]*=[\x00-\x20\/]*([\`\'\"]*)[\x00-\x20\/]*data[\x00-\x20]*:#Uu', '$1=$2nodata...', $string);
$string = preg_replace('#(<[^>]+[\x00-\x20\"\'\/])style[^>]*>#iUu', "$1>", $string);
$string = preg_replace('#</*\w+:\w[^>]*>#i', "", $string);
$string = preg_replace('/^<\?php(.*)(\?>)?$/s', '$1', $string);
$string = preg_replace('#</*(applet|meta|xml|blink|link|script|embed|object|frame|iframe|frameset|ilayer|layer|bgsound|title|base)[^>]*>#i', "", $string);
return $string;
}
The HTML will be affected by your CSS when editing and displayed. You will need code additional CSS rules if this is an issue. If the issue is when editing move to a iframe based editor and to prevent the css display the html content in an iframe.
If you want another suggestion elRTE is my goto editor these days. A little more advanced but totally worth it once you get to know the code base and API. You will face the same issues as above as will any editor. Except the CSS during editing since elRTE is framebased and you can specify stylesheets. elRTE Homepage
Edit: I posted this assuming you were using PHP. Apologies if not.

How to parse a remote website and create a link on every single word for a dictionary tooltip?

I want to parse a random website, modify the content so that every word is a link (for a dictionary tooltip) and then display the website in an iframe.
I'm not looking for a complete solution, but for a hint or a possible strategy. The linking is my problem, parsing the website and displaying it in an iframe is quite simple. So basically I have a String with all the html content. I'm not even sure if it's better to do it serverside or after the page is loaded with JS.
I'm working with Ruby on Rails, jQuery, jRails.
Note: The content of the href tag depends on the word.
Clarification:
I tried a regexp and it already kind of works:
#site.gsub!(/[A-Za-z]+(?:['-][A-Za-z]+)?|\\d+(?:[,.]\\d+)?/) {|word| '' + word + ''}
But the problem is to only replace words in the text and leave the HTML as it is. So I guess it is a regex problem...
Thanks for any ideas.
I don't think a regexp is going to work for this - or, at least, it will always be brittle. A better way is to parse the page using Hpricot or Nokogiri, then go through it and modify the nodes that are plain text.
It sounds like you have it mostly planned out already.
Split the content into words and then for each word, create a link, such as whatever
EDIT (based on your comment):
Ahh ... I recommend you search around for screen scraping techniques. Most of them should start with removing anything between < and > characters, and replacing <br> and <p> with newlines.
I would use Nokogiri to remove the HTML structure before you use the regex.
no_html = Nokogiri::HTML(html_as_string).text
Simple. Hash the HTML, run your regex, then unhash the HTML.
<?php
class ht
{
static $hashes = array();
# hashes everything that matches $pattern and saves matches for later unhashing
function hash($text, $pattern) {
return preg_replace_callback($pattern, array(self,'push'), $text);
}
# hashes all html tags and saves them
function hash_html($html) {
return self::hash($html, '`<[^>]+>`');
}
# hashes and saves $value, returns key
function push($value) {
if(is_array($value)) $value = $value[0];
static $i = 0;
$key = "\x05".++$i."\x06";
self::$hashes[$key] = $value;
return $key;
}
# unhashes all saved values found in $text
function unhash($text) {
return str_replace(array_keys(self::$hashes), self::$hashes, $text);
}
function get($key) {
return self::$hashes[$key];
}
function clear() {
self::$hashes = array();
}
}
?>
Example usage:
ht::hash_html($your_html);
// your word->href converter here
ht::unhash($your_formatted_html);
Oh... right, I wrote this in PHP. Guess you'll have to convert it to ruby or js, but the idea is the same.

Resources