Parse FacebookPage Using BeautifullSoup - parsing

i'm searching for a name in an html page of facebook.
if I take the file html.txt like this:
html = open('html.txt','r').read()
soup = BeautifulSoup(html)
if I search for the name with find it seems to be ok, but if i Try searching with BS i cant find anything..
>>>html.find("Joseph Tan")
98939
>>>html[98700:99000]
'<div class="fwn fcg"><span class="fcg"><span class="fwb"><a class="profileLink" href="https://www.facebook.com/ASD.391" data-ft="{"tn":"l"}" data-hovercard="/ajax/hovercard/user.php?id=123456">Alex Tan</a></span> condivided the photo <a class="profileLink" '
>>> soup.findAll('div',{'class':'fwn fcg'})
[]
>>> soup.findAll('span',{'class':'fwb'})
[]
>>> soup.findAll('a',{'class':'profileLink'})
[]
>>>
Someone can help me? thanks a lot
EDIT: RE-CREATED HTML PAGE
html page

It is working as below:
print soup.find_all('div', class_=['fwn','fcg'])
OUTPUT:
[<div class="uiHeaderActions rfloat _ohf fsm fwn fcg"><a class="_1c1m" href="#" role="button">Segna tutti come già letti</a> · <a accesskey="m" ajaxify="/ajax/messaging/composer.php" href="/messages/new/" id="u_0_8" rel="dialog" role="button">Invia un nuovo messaggio</a></div>, <div class="uiHeaderActions fsm fwn fcg">Segna come già letto · Impostazioni</div>, <div class="fsm fwn fcg"><a ajaxify="/settings/language/language/?uri=https%3A%2F%2Fwww.facebook.com%2Fshares%2Fview%3Fid%3D10152555113196961&source=TOP_LOCALES_DIALOG" href="#" rel="dialog" role="button" title="Usa Facebook in un'altra lingua.">Italiano</a></div>]
According to ==>this link, this is the style of how to search classes and other HTML elements using BS. Please check.
There were two problems.
1. The way you wrote is not matched with the link above I provided. May be you are not using updated version of BS.
2. There are two classes 'fwn' and 'fcg'. So you have to give their names in a list and this is how I got the output.
Same is is applicable for 'span' and 'a' as below:
print soup.find_all('span', class_='jewelCount')
print soup.find_all('a', class_='_awj')
Your given 'span' with class 'fwb' and given 'a' with class 'profileLink' was not found.Because, they are not present in the HTML.
You can check by printing all spans and a's.
Write print soup.find_all('a') and print soup.find_all('span')* to check on your own.
Hope this will help, if not, write again! :)

Related

Thymeleaf does not allow "lt" as query string parameter name

I cannot use "lt" as query string parameter name in thymeleaf. How can I achieve that?
This is my example code:
<a th:href="#{/payment/otp-resend(lt=${landingToken.sessionId})}" class="sifretekrar" th:text="#{lp.resendOtp}"></a>
And it gives the following error:
org.thymeleaf.exceptions.TemplateProcessingException: Could not parse as expression: "#{/payment/otp-resend(lt=${landingToken.sessionId})}" (template: "otp-entry-page" - line 70, col 12)
at org.thymeleaf.standard.expression.StandardExpressionParser.parseExpression(StandardExpressionParser.java:131)
at org.thymeleaf.standard.expression.StandardExpressionParser.parseExpression(StandardExpressionParser.java:62)
at org.thymeleaf.standard.expression.StandardExpressionParser.parseExpression(StandardExpressionParser.java:44)
at org.thymeleaf.engine.EngineEventUtils.parseAttributeExpression(EngineEventUtils.java:220)
at org.thymeleaf.engine.EngineEventUtils.computeAttributeExpression(EngineEventUtils.java:207)
at org.thymeleaf.standard.processor.AbstractStandardExpressionAttributeTagProcessor.doProcess(AbstractStandardExpressionAttributeTagProcessor.java:125)
at org.thymeleaf.processor.element.AbstractAttributeTagProcessor.doProcess(AbstractAttributeTagProcessor.java:74)
at org.thymeleaf.processor.element.AbstractElementTagProcessor.process(AbstractElementTagProcessor.java:95)
at org.thymeleaf.util.ProcessorConfigurationUtils$ElementTagProcessorWrapper.process(ProcessorConfigurationUtils.java:633)
at org.thymeleaf.engine.ProcessorTemplateHandler.handleOpenElement(ProcessorTemplateHandler.java:1314)
at org.thymeleaf.engine.OpenElementTag.beHandled(OpenElementTag.java:205)
at org.thymeleaf.engine.TemplateModel.process(TemplateModel.java:136)
at org.thymeleaf.engine.TemplateManager.parseAndProcess(TemplateManager.java:661)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1098)
at org.thymeleaf.TemplateEngine.process(TemplateEngine.java:1072)
at org.thymeleaf.spring5.view.ThymeleafView.renderFragment(ThymeleafView.java:362)
at org.thymeleaf.spring5.view.ThymeleafView.render(ThymeleafView.java:189)
at org.springframework.web.servlet.DispatcherServlet.render(DispatcherServlet.java:1370)
at org.springframework.web.servlet.DispatcherServlet.processDispatchResult(DispatcherServlet.java:1116)
at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1055)
at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942)
at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:998)
at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:890)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:634)
at org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:875)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:741)
Best regards.
EDIT
The accepted answer is correct. However, Intellij IDEA shows it as if it has an error. The screenshot is attached below. The following two lines are working while IDE displays error message for both of them:
<a th:href="#{/payment/mps-otp-resend} + '?lt=' + ${landingToken.sessionId}" class="sifretekrar" th:text="#{lp.resendOtp}"></a>
<a th:href="#{/payment/mps-otp-resend('lt'=${landingToken.sessionId})}" class="sifretekrar" th:text="#{lp.resendOtp}"></a>
You can quote lt, which should allow you to use it as a parameter name:
<a th:href="#{/payment/otp-resend('lt'=${landingToken.sessionId})}" class="sifretekrar" th:text="#{lp.resendOtp}"></a>

How to show String new lines on gsp grails file?

I've stored a string in the database. When I save and retrieve the string and the result I'm getting is as following:
This is my new object
Testing multiple lines
-- Test 1
-- Test 2
-- Test 3
That is what I get from a println command when I call the save and index methods.
But when I show it on screen. It's being shown like:
This is my object Testing multiple lines -- Test 1 -- Test 2 -- Test 3
Already tried to show it like the following:
${adviceInstance.advice?.encodeAsHTML()}
But still the same thing.
Do I need to replace \n to or something like that? Is there any easier way to show it properly?
Common problems have a variety of solutions
1> could be you that you replace \n with <br>
so either in your controller/service or if you like in gsp:
${adviceInstance.advice?.replace('\n','<br>')}
2> display the content in a read-only textarea
<g:textArea name="something" readonly="true">
${adviceInstance.advice}
</g:textArea>
3> Use the <pre> tag
<pre>
${adviceInstance.advice}
</pre>
4> Use css white-space http://www.w3schools.com/cssref/pr_text_white-space.asp:
<div class="space">
</div>
//css code:
.space {
white-space:pre
}
Also make a note if you have a strict configuration for the storage of such fields that when you submit it via a form, there are additional elements I didn't delve into what it actually was, it may have actually be the return carriages or \r, anyhow explained in comments below. About the good rule to set a setter that trims the element each time it is received. i.e.:
Class Advice {
String advice
static constraints = {
advice(nullable:false, minSize:1, maxSize:255)
}
/*
* In this scenario with a a maxSize value, ensure you
* set your own setter to trim any hidden \r
* that may be posted back as part of the form request
* by end user. Trust me I got to know the hard way.
*/
void setAdvice(String adv) {
advice=adv.trim()
}
}
${raw(adviceInstance.advice?.encodeAsHTML().replace("\n", "<br>"))}
This is how i solve the problem.
Firstly make sure the string contains \n to denote line break.
For example :
String test = "This is first line. \n This is second line";
Then in gsp page use:
${raw(test?.replace("\n", "<br>"))}
The output will be as:
This is first line.
This is second line.

Looking for guide line about Razor syntax in asp.net mvc

i am learning asp.net mvc just going through online tutorial
1) just see <span>#model.Message</span> and #Html.Raw(model.Message)
suppose if "Hello Word" is stored in Message then "Hello Word" should display if i write statement like
<span>#model.Message</span> but i just could not understand what is the special purpose about #Html.Raw(model.Message).
what #Html.Raw() will render ?
please discuss with few more example to understand the difference well.
2) just see the below two snippet
#if (foo) {
<text>Plain Text</text>
}
#if (foo) {
#:Plain Text is #bar
}
in which version of html the tag called was introduce. is it equivalent to or what ? what is the purpose of
this tag ?
just tell me about this #:Plain Text is #bar
what is the special meaning of #: ?
if our intention is to mixing text with expression then can't we write like Plain Text is #bar
3) <span>ISBN#(isbnNumber)</span>
what it will print ? if 2000 is stored in isbnNumber variable then it may print <span>ISBN2000</span>. am i right ?
so tell me what is the special meaning of #(variable-name) why bracket along with # symbol ?
4) just see
<span>In Razor, you use the
##foo to display the value
of foo</span>
if foo has value called god then what this ##foo will print ?
5 ) see this and guide me about few more syntax given below point wise
a) #(MyClass.MyMethod<AType>())
b)
#{
Func<dynamic, object> b =
#<strong>#item</strong>;
}
#b("Bold this")
c) <div class="#className foo bar"></div>
6) see this
#functions
{
string SayWithFunction(string message)
{
return message;
}
}
#helper SayWithHelper(string message)
{
Text: #message
}
#SayWithFunction("Hello, world!")
#SayWithHelper("Hello, world!")
what they are trying to declare ? function ?
what kind of syntax it is ?
it seems that two function has been declare in two different way ? please explain this points with more sample. thanks
Few More question
7)
#{
Func<dynamic, object> b = #<strong>#item</strong>;
}
<span>This sentence is #b("In Bold").</span>
what the meaning of above line ? is it anonymous delegate?
when some one will call #b("In Bold") then what will happen ?
8)
#{
var items = new[] { "one", "two", "three" };
}
<ul>
#items.List(#<li>#item</li>)
</ul>
tell me something about List() function and from where the item variable come ?
9)
#{
var comics = new[] {
new ComicBook {Title = "Groo", Publisher = "Dark Horse Comics"},
new ComicBook {Title = "Spiderman", Publisher = "Marvel"}
};
}
<table>
#comics.List(
#<tr>
<td>#item.Title</td>
<td>#item.Publisher</td>
</tr>)
</table>
please explain briefly the above code. thanks
1) Any kind of #Variable output makes MVC automatically encode the value. That is to say if foo = "Joe & Dave", then #foo becomes Joe & Dave automatically. To escape this behavior you have #Html.Raw.
2) <text></text> is there to help you when the parser is having trouble. You have to keep in mind Razor goes in and out of HTML/Code using the semantics of the languages. that is to say, it knows it's in HTML using the XML parser, and when it's in C#/VB by its syntax (like braces or Then..End respectively). When you want to stray from this format, you can use <text>. e.g.
<ul>
<li>
#foreach (var item in items) {
#item.Description
<text></li><li></text>
}
</li>
</ul>
Here you're messing with the parser because it no longer conforms to "standard" HTML blocks. The </li> would through razor for a loop, but because it's wrapped in <text></text> it has a more definitive way of knowing where code ends and HTML begins.
3) Yes, the parenthesis are there to help give the parser an explicit definition of what should be executed. Razor makes its best attempt to understand what you're trying to output, but sometimes it's off. The parenthesis solve this. e.g.
#Foo.bar
If you only had #Foo defined as a string, Razor would inevitably try to look for a bar property because it follows C#'s naming convention (this would be a very valid notation in C#, but not our intent). So, to avoid it from continuing on we can use parenthesis:
#(Foo).bar
A notable exception to this is when there is a single trailing period. e.g.
Hello, #name.
The Razor parser realizes nothing valid (in terms of the language) follows, so it just outputs name and a period thereafter.
4) ## is the escape method for razor when you need to actually print #. So, in your example, you'd see #foo on the page in plain text. This is useful when outputting email addresses directly on the page, e.g.
bchristie##contoso.com
Now razor won't look for a contoso.com variable.
5) You're seeing various shortcuts and usages of how you bounce between valid C# code and HTML. Remember that you can go between, and the HTML you're seeing is really just a compiled IHtmlString that is finally output to the buffer.
1.
By default, Razor automatically html-encodes your output values (<div> becomes <div>). #Html.Raw should be used when you explicitly want to output the value as-is without any encoding (very common for outputting JSON strings in the middle of a <script>).
2.
The purpose of <text> and #: is to escape the regular Razor syntax flow and output literal text values. for example:
// i just want to print "Haz It" if some condition is true
#if (Model.HasSomething) { Haz It } // syntax error
#if (Model.HasSomething) { <text>Haz It</text> } // just fine
As of #:, it begins a text literal until the next line-feed (enter), so:
#if (Model.HasSomething) { #:Haz It } // syntax error, no closing '}' encountered
// just fine
#if (Model.HasSomething)
{
#:Haz It
}
3.
By default, if your # is inside a quote/double-quotes (<tr id="row#item.Id"), Razor interprets it as a literal and will not try to parse it as expression (for obvious reasons), but sometimes you do want it to, then you simply write <tr id="row#(item.Id").
4.
The purpose of ## is simply to escape '#'. when you want to output '#' and don't want Razor to interpret is as an expression. then in your case ##foo would print '#foo'.
5.
a. #(MyClass.MyMethod<AType>()) would simply output the return value of the method (using ToString() if necessary).
b. Yes, Razor does let you define some kind of inline functions, but usually you better use Html Helpers / Functions / DisplayTemplates (as follows).
c. See above.
6.
As of Razor Helpers, see http://weblogs.asp.net/scottgu/archive/2011/05/12/asp-net-mvc-3-and-the-helper-syntax-within-razor.aspx

How do you include hashtags within Twitter share link text?

I'm writing a site with a custom tweet button that uses the www.twitter.com/share function, however the problem I am having is including hash '#' characters within the tweet text.
For example:
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+#branstonpickel+right+now
The tweet text comes out as 'I am eating' and omits the hash and everything after.
I had a quick look on the Twitter forums and learnt the hash '#' character cannot be part of the share url. On https://dev.twitter.com/discussions/512#comment-877 it was said that:
Hashes are special characters in the URL (they identify document fragments) so they, and anything following, does not get sent the server.
and
you need to URLEncode it, so use %23
When I tried the 2nd point in my test link:
www.twitter.com/share?url=www.example.com&text=I+am+eating+%23branstonpickel+right+now
The tweet text came out as 'I am eating %23branstonpickel right now' literally including %23 instead of converting it to a hash.
Sorry for the waffely question, but does anyone know what it is I'm doing wrong?
Any feedback would be greatly appreciated :)
It looks like this is the basic setup:
https://twitter.com/intent/tweet?
url=<url to tweet>
text=<text to tweet>
hashtags=<comma separated list of hashtags, with no # on them>
This would pre-built a tweet of: <text> <url> <hashtags>
The above example would be:
https://twitter.com/intent/tweet?url=http://www.example.com&text=I+am+eating+branston+pickel+right+now&hashtags=bransonpickel,pickles
There used to be a bug with the hashtags parameter... it only showed the first n-1 hashtags. Currently this is fixed.
you can use %23 instead of hash (#) in url eg
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+%23branston+%23pickel+right+now
I may be wrong but i think the hashtag has to be passed as a separate variable that will appear at the end of your tweet ie:
http://www.twitter.com/share?url=www.example.com&text=I+am+eating+branston+pickel+right+now&hashtag=bransonpickel
will result in "I am eating branston pickel right now #branstonpickle"
On a separate note, I think pickel should be pickle!
Cheers
Toby
use encodeURIComponent to encode the url
If you're using PHP, you can use the following:
<?php echo 'http://www.twitter.com/share?' . http_build_query(array(
'url' => 'http://www.example.com',
'text' => 'I am eating #branstonpickel right now'
)); ?>
This will do all the URL encoding for you, and it's easy to read.
For more information on the http_build_query, see the PHP manual:
http://us2.php.net/http_build_query
For url with line jump, # , # and special unicode in it, the following works :
var lineJump = encodeURI(String.fromCharCode(10)),
hash = "%23", arobase="%40",
tweetText = 'https://twitter.com/intent/tweet?text=Le signe chinois '+hans+' '+item.pinyin+': '+item.definition.replace(";",",")+'.'
+lineJump+'Merci '+arobase+'Inalco_Officiel '+arobase+'CRIparis ❤️🇨🇳 '
+lineJump+hash+'Chinois '+hash+'MOOC'
+lineJump+'https://hanzi.cri-paris.org/',
tweetTxtUrlEncoded = tweetText+ "" +encodeURIComponent('#'+lesson+encodeURIComponent(hans));
urlencode
https://twitter.com/intent/tweet?text=<?= urlencode("I am eating #branstonpickel right now"); ?>"
You can just use this code and modify it
20% means space
23% means hashtag
In JS you can easily encode the special characters using encoreURIComponent.
(Warning: don't use encodeURI as "#" and "#" are not escaped.)
Here's an example with mention and hashtag:
const text = "Hello #world ! Go follow #StackOverflow";
const tweetUrl = `https://twitter.com/intent/tweet?text=${ encodeURIComponent(text) }`;

Generate a file list based on an array

I tried a few things but this week i feel like my brain's having holidays and i need to complete this thing.. so i hope someone can help me.
I need to create a filelist based on a hash which is saved into a database. The has looks like this:
['file1', 'dir1/file2', 'dir1/subdir1/file3']
Output should be like this:
file1
dir1
file2
subdir1
file3
in html, preferrably like this (to extend it with js to fold and multiselect)
<ul>
<li>file1
<li>dir1</li>
<ul>
<li>file2</li>
<li>subdir1</li>
<ul>
<li>file3</li>
</ul>
</ul>
</ul>
I'm using Ruby on Rails and try to achieve this in an RJS template. But this don't really matters. You can also help me with some detailed pseudo-code.
Someone know how to solve this?
Edit
Thanks to everyone for these solutions. Listing works, i extended it to a foldable solution to show/hide directory contents. I still have one problem: The code aims to have complete file paths in checkboxes behind the entries for a synchronisation. Based on sris' solution, i can only read the current file and it's subs, but not the whole path from the root. For a better understanding:
Currently:
[x] dir1
[x] dir2
[x] file1
gives me
a checkbox with the same value a sthe text displays, e.g "file1" for [x] file1. But what i need is a full path, e.g "dir1/dir2/file1" for [x] file1.
Does someone have another hint how to add this?
Here's a quick implementation you can use for inspiration. This implementation disregards the order of files in the input Array.
I've updated the solution to save the entire path as you required.
dirs = ['file1', 'dir1/file2', 'dir1/subdir1/file3', 'dir1/subdir1/file5']
tree = {}
dirs.each do |path|
current = tree
path.split("/").inject("") do |sub_path,dir|
sub_path = File.join(sub_path, dir)
current[sub_path] ||= {}
current = current[sub_path]
sub_path
end
end
def print_tree(prefix, node)
puts "#{prefix}<ul>"
node.each_pair do |path, subtree|
puts "#{prefix} <li>[#{path[1..-1]}] #{File.basename(path)}</li>"
print_tree(prefix + " ", subtree) unless subtree.empty?
end
puts "#{prefix}</ul>"
end
print_tree "", tree
This code will produce properly indented HTML like your example. But since Hashes in Ruby (1.8.6) aren't ordered the order of the files can't be guaranteed.
The output produced will look like this:
<ul>
<li>[dir1] dir1</li>
<ul>
<li>[dir1/subdir1] subdir1</li>
<ul>
<li>[dir1/subdir1/file3] file3</li>
<li>[dir1/subdir1/file5] file5</li>
</ul>
<li>[dir1/file2] file2</li>
</ul>
<li>[file1] file1</li>
</ul>
I hope this serves as an example of how you can get both the path and the filename.
Think tree.
# setup phase
for each pathname p in list
do
add_path_to_tree(p)
od
walk tree depth first, emitting HTML
add_path_to_tree is recursive
given pathname p
parse p into first_element, rest
# that is, "foo/bar/baz" becomes "foo", "bar/baz"
add first_element to tree
add_path_to_tree(rest)
I'll leave the optimal data struct (list of lists) for the tree (list of lists) as an exercise.
Expanding on sris's answer, if you really want everything sorted and the files listed before the directories, you can use something like this:
def files_first_traverse(prefix, node = {})
puts "#{prefix}<ul>"
node_list = node.sort
node_list.each do |base, subtree|
puts "#{prefix} <li>#{base}</li>" if subtree.empty?
end
node_list.each do |base, subtree|
next if subtree.empty?
puts "#{prefix} <li>#{base}</li>"
files_first_traverse(prefix + ' ', subtree)
end
puts '#{prefix}</ul>'
end

Resources