Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I want to crawl an entire website. I am using Simple_html_dom for parsing but the problem is that it takes only one webpage link at a time. I want to provide only start (home page) link and it should crawl and parse all the web pages of that website automatically. Any suggestion how to do this ?
When parsing the DOM of that single page, store all links (within the same domain) in an array. Then, at the end of parsing, check if the array isn't empty. If it isn't, take the first link and do the same.
So something like (code sample written with Python-like syntax, but you can adapt it to PHP easily - mine is rusty).
referenced_links = ['your_initial_page.html']
while referenced_links: # if the array isn't empty...
crawl_dom(referenced_links[0])
referenced_links.pop(0) # remove the first item in that array
def crawl_dom(url):
# download the url, parse the DOM and append all hyperlinks to the array referenced_links
Related
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 2 years ago.
Improve this question
I am newbie in Apache Jena.I store my RDF-dataset in jena tdb and I serve it in fuseki server.Until now,I am fine.The problem is that I want the output of the SPARQL query to be displayed in a html page.I can't find the way to do this.
If you have ideas,do not hesitate to share them with me!
For part of a page, you need to write a small piece of code that takes a result set and creates the HTML in the format and styling that you want.
You can add an XML stylesheet with "?stylesheet=" but that will get you a whole page.
See this example at www.sparql.org.
http://www.sparql.org/books/sparql?query=PREFIX+books%3A+++%3Chttp%3A%2F%2Fexample.org%2Fbook%2F%3E%0D%0APREFIX+dc%3A++++++%3Chttp%3A%2F%2Fpurl.org%2Fdc%2Felements%2F1.1%2F%3E%0D%0ASELECT+%3Fbook+%3Ftitle%0D%0AWHERE+%0D%0A++%7B+%3Fbook+dc%3Atitle+%3Ftitle+%7D&output=xml&stylesheet=%2Fxml-to-html.xsl
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 7 years ago.
Improve this question
I need to get data from a website. But this website doesn't have an api. Therefore I need to write a bot that can use the search text field and get URL of links. How can I write a bot that can run on iOS with swift?
You can get the html profiles from the URL links, and write regular expressions to parse the html profiles.
You do not need to write a bot. But you may get incomplete html data from normal request, because these websites block the requests other than browsers. If this, you can edit the http header to cheat the website that you really visit by browser.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
If you visit xkcd.com it will display a webcomic image. Below it, the direct link to that image is showed below it, for example:
Image URL (for hotlinking/embedding):
http://imgs.xkcd.com/comics/horse.png
Now, how do I parse the page to extract the image link from the page and put it into NSURL variable, using Xcode 6 and Swift, so I can make my UIWebView display only the image rather than the whole page.
Thanks!
Use a regular expression. See this SO Answer for details.
Pay particular attention to the Notes:
(?<=) Ex: (?<=AB) means preceded by AB
(?=) Ex: (?=FG) means followed by FG
These do not capture the matched portion.
Documantation: ICU User Guide Regular Expressions
You can use the Swift String rangeOfString(pattern, options) in place of NSRegularExpression which is much more involved.
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I'm implementing a web based application in which the help section has "show more faqs" link that will expand the page with more content from the category. The section should pull all faqs question and answers from category which are retrieved from database. Im stuck in finding out as to how to write a html based code to show the remaining contents.
Show more faqs
2 ways to do it:
If you are not comfortable with Ajax: Load all the remaining FAQs during the page load and wrap them in a DIV with hidden class. When the show-more link is clicked, remove the hidden-class from the div.
If you are OK with Ajax: On click of the show-more link, send an Ajax request. It should send across the remaining FAQs. On success of the Ajax-request, append it to the existing FAQs HTML. https://gist.github.com/satyatechsavy/7673942
Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
I have 2 different modules in my project
1)Album
2)User
In my album module index action shows the all data of album table.
I want to create the PDF from album table data. How can I configure it and which type of class can I use for PDF?
You need the ZendPdf package, just follow the instructions on the page. Documentation is not ready yet simply because it's not a core module. Documentation will probably follow as soon as all bugs are worked out with the core stuff.
I suggest DomPDF. It creates PDF documents based on html documents. It supports CSS for formatting, renders tables, embeds images and has all the features I have ever needed, and many more. It can even embed javascripts inside PDF which actually works.
I use Zend_View to render HTML document which I then pass to DomPDF to render PDF.
http://code.google.com/p/dompdf/