I am building a simple blog page where I wish to use markdown as the text format.
I have a working page when running in Dartium but when I compile to js the markdown does not come out properly formatted. It's missing paragraphs only I think but headers and lists are working fine.
I'm displaying the blog post in a polymer element and reading in a simple file from the server. I have made a simple sample without polymer which seems to work fine but I haven't tried it on the production server.
The basic code is outlined below, any tips or a better way of doing this? I will eventually move the posts to a db as text but I'm open to suggestions for other ways of presenting blog posts with some simple formatting, thanks.
getPostsFromServer(){
String path = 'post1.md';
HttpRequest req = new HttpRequest();
req
..open('GET', path)
..onLoadEnd.listen((e) => printPost(req))
..send('');
}
void printPost(HttpRequest req){
var postdiv = $['article'];
if(req.status == 200){
var postText = req.responseText;
print(postText);
postdiv.innerHtml = markdownToHtml(postText);
}
else{
postdiv.innerHtml = 'Failed to load newsletter, sorry.';
}
}
Related
I am trying to scrape the results from a Quora search query using ImportXML.
The URL is of this form: https://www.quora.com/search?q=scrape%20Quora&time=year
I've tried using ImportXML, and can't get anything to work. As an example, I inspected the questions, and found they were inside a div with a class name of 'q-text puppeteer_test_question_title'. So I tried to import like this, but I just get #N/A:
importxml("https://www.quora.com/search?q=scrape%20Quora&time=year","//div[#class='q-text puppeteer_test_question_title']")
This is clearly not working: is there a fix or just not possible (and why)?
Thank you.
Quora (as of now) runs on JavaScript and google sheets import formulae do not support the scrapping of JS elements:
You can try to fetch the first 3 responses this way (quickly written, could be improved)
function myFunction() {
var options = {
'muteHttpExceptions': true,
'followRedirects': false
};
var url = 'https://www.quora.com/search?q=scrape%20Quora&time=year'
var jsonStrings = UrlFetchApp.fetch(url,options).getContentText().split('window.ansFrontendGlobals.data.inlineQueryResults.results["')
jsonStrings.forEach((jsonString,i) => {
if (i > 0) {
console.log(jsonString.split('"] = ')[1].split('\n')[0])
}
})
}
and then parse the complex json inside. However, other answers are transmitted by quora when scrolling down by ajax asynchronous request.
We have javascript files which get bundled and compressed using the normal asp.net mvc mechanism.
We also have some javascript files which get transformed via httphandlers to deal with phrases, colour schemes, etc. At present these are simply linked in, could these be compressed and bundled but at the user level?
Unfortunately we can't group these easily, but even if we could we couldn't do it within a global.ascx file without a lot of rejigging. I mention this as it's not simply a case of having bundle1 = french, bundle2=german, etc
Compression I'm assuming could be done via IIS and static compression, but bundling?
thanks
There is no easy way to do this.
The easiest I can see is to skip the whole Bundling and Minification that is shipped with MVC 5.
Handle it yourself. Generate CSS for your user and have it go through this piece of code:
public static string RemoveWhiteSpaceFromStylesheets(string body)
{
body = Regex.Replace(body, #"[a-zA-Z]+#", "#");
body = Regex.Replace(body, #"[\n\r]+\s*", string.Empty);
body = Regex.Replace(body, #"\s+", " ");
body = Regex.Replace(body, #"\s?([:,;{}])\s?", "$1");
body = body.Replace(";}", "}");
body = Regex.Replace(body, #"([\s:]0)(px|pt|%|em)", "$1");
// Remove comments from CSS
body = Regex.Replace(body, #"/\*[\d\D]*?\*/", string.Empty);
return body;
}
Or any CSS minifier for that matter. Just make sure to include proper caching tag for your user and you won't even have to regenerate it too much.
Code taken from Mads Kristensen
I need to call at server side, an URL and work with the HTML content off the response. For this I'm using the HTTP library from Dart like this :
http.read('myUrl').then((contents) {
//contents to HTMLDocument format //Need to transform the String contents to HTML object
});
And I want to convert the response to a HTMLDocument (or other object I don't know) to be able to retrieve element in it by HTML tag or CSS class, like with JQuery for example.
Does anybody have an idea to perform this ?
You canuse html5lib package from pub. It allows to parse HTML and present it DOM like element tree on server side. The element tree will eventually "be compatible with dart:html, so the same code will work on the client and the server" in the future. See the readme for a getting started example.
"I need to call at server side"
Not sure exactly what you mean.
If you are running in the browser and calling the server you could try using a DocumentFragment. Something like this:
http.read(url).then((html) {
var fragment = new DocumentFragment(html);
var element = fragment.query('.foo');
// code here...
});
Otherwise if you're running server side, as the other answer mentions, html5lib is the way to go. Last time I looked the query() method in html5lib only supported tagname queries not classes, or ids.
I'm implementing a web robot that has to get all the links from a page and select the needed ones. I got it all working except I encountered a probem where a link is inside a "table" or a "span" tag.
Here's my code snippet:
Document doc = Jsoup.connect(url)
.timeout(TIMEOUT * 1000)
.get();
Elements elts = doc.getElementsByTag("a");
And here's the example HTML:
<table>
<tr><td></td></tr>
</table>
My code will not fetch such links. Using doc.select doesn't help too. My question is, how to get all the links from the page?
EDIT: I think I know where the problem is. THe page I'm having trouble with is very badly written, HTML validator throws out tremendous amount of errors. Could this cause problems?
In general Jsoup can handle moste bad HTML. Dump the HTML as JSoup uses it (you can simple output doc.toString()).
Tip: use select() instead of getElementsByX(), its faster and more flexible.
Elements elts = doc.select("a"); (edit)
Here's an overview about the Selector-API: http://jsoup.org/cookbook/extracting-data/selector-syntax
Try this code
String url = "http://test.com";
Document doc = null;
try {
doc = Jsoup.connect(url).get();
Elements links = doc.select(<i>"a[href]"<i>);
Element link;
for(int j=0;j<150;j++){
link=links.get(j);
System.out.println("a= " link.attr("abs:href").toString() );
}
} catch (IOException e) {
// TODO Auto-generated catch block
e.printStackTrace();
}
I need some guidelines on how to detect the headline and content of crawled pages. I've been seeing some very weird front-end codework since i started working on this crawler.
You could try the Simple HTML DOM Parser. It sports a syntax to find specific elements similar to jQuery.
They have an example on how to scrape Slashdot:
// Create DOM from URL
$html = file_get_html('http://slashdot.org/');
// Find all article blocks
foreach($html->find('div.article') as $article) {
$item['title'] = $article->find('div.title', 0)->plaintext;
$item['intro'] = $article->find('div.intro', 0)->plaintext;
$item['details'] = $article->find('div.details', 0)->plaintext;
$articles[] = $item;
}
print_r($articles);