I am trying to get wkhtmltopdf to take a screenshot of http://www.health.ny.gov/professionals/doctors/conduct/license_lookup.htm with the field search for license "lastname,first". Below is what I am currently sending in command line and is not working.
wkhtmltopdf -n post http://www.nysed.gov/COMS/OP001/OPSCR1 lastname,first screenshot.pdf
Any ideas?
If you are trying to get a PDF out of the search results page, the following command works using the latest version at the time of posting (wkhtmltopdf 0.11.0 rc2):
wkhtmltopdf --post profcd 60 --post pname smith http://www.nysed.gov/COMS/OP001/OPSCR1 out.pdf
You can check the version with wkhtmltopdf -V.
You need the profcd to as the backed seems to require it for the searches to work, otherwise you get something like "must select Profession".
Is that what you meant? Or did you want a PDF of the search page itself with the form inputs filled with your values? That will require some trickery I don't think it's impossible.
Related
I am using CommonCrawl to restore pages I should have achieved but have not.
In my understanding, the Common Crawl Index offers access to all URLs stored by Common Crawl. Thus, it should give me an answer if the URL is achieved.
A simple script downloads all indices from the available crawls:
./cdx-index-client.py -p 4 -c CC-MAIN-2016-18 *.thesun.co.uk --fl url -d CC-MAIN-2016-18
./cdx-index-client.py -p 4 -c CC-MAIN-2016-07 *.thesun.co.uk --fl url -d CC-MAIN-2016-07
... and so on
Afterwards I have 112mb of data and simply grep:
grep "50569" * -r
grep "Locals-tell-of-terror-shock" * -r
The pages are not there. Am I missing something? The page were published in 2006 and removed in June 2016. So I assume that CommonCrawl should have achieved them?
Update: Thanks to Sebastian, two links are left...
Two URLs are:
http://www.thesun.co.uk/sol/homepage/news/50569/Locals-tell-of-terror-shock.html
http://www.thesun.co.uk/sol/homepage/news/54032/Sir-Ians-raid-apology.html
They even proposed a "URL Search Tool" which answers with a 502 - Bad Gateway...
You can use AWS Athena to query Common crawl index like SQL to find the URL and then use the offset, length and filename to read the content in your code. See details here - http://commoncrawl.org/2018/03/index-to-warc-files-and-urls-in-columnar-format/
The latest version of search on CC index provides the ability to search and get results of all the urls from particular tld.
In your case, you can use http://index.commoncrawl.org and then select index of your choice. Search for http://www.thesun.co.uk/*.
Hope you get all the urls from tld and then you can filter the urls of your choice from json response.
AFAIK pages are crawled once and only once, so the pages you're looking for could be in any of the archives.
I wrote a small software that can be used to search all archives at once (here's also a demonstration showing how to do this). So in your case I searched all archives (2008 to 2019) and typed your URLs on the common crawl editor, and found these results for your first URL (couldn't find the second so I guess is not in the database?):
FileName Offset Length
------------------------------------------------------------- ---------- --------
parse-output/segment/1346876860877/1346943319237_751.arc.gz 7374762 12162
crawl-002/2009/11/21/8/1258808591287_8.arc.gz 87621562 20028
crawl-002/2010/01/07/5/1262876334932_5.arc.gz 80863242 20075
Not sure why there're three results. I guess they do re-scan some URLs.
Of if you open any of these URLs on the application I linked you should be able to see the pages in a browser (this is a custom scheme that that includes the filename, offset and length in order to load HTML from the common crawl database):
crawl://page.common/parse-output/segment/1346876860877/1346943319237_751.arc.gz?o=7374762&l=12162&u=http%3A%2F%2Fwww.thesun.co.uk%2Fsol%2Fhomepage%2Fnews%2F50569%2FLocals-tell-of-terror-shock.html
crawl://page.common/crawl-002/2009/11/21/8/1258808591287_8.arc.gz?o=87621562&l=20028&u=http%3A%2F%2Fwww.thesun.co.uk%2Fsol%2Fhomepage%2Fnews%2F50569%2FLocals-tell-of-terror-shock.html
crawl://page.common/crawl-002/2010/01/07/5/1262876334932_5.arc.gz?o=80863242&l=20075&u=http%3A%2F%2Fwww.thesun.co.uk%2Fsol%2Fhomepage%2Fnews%2F50569%2FLocals-tell-of-terror-shock.html
i have ruby on rails based api which accepts a get request.
example :
http://localhost:3000/api/search?query=whatis&access_token=324nbkjh3g32423
when i do curl from mac terminal like
curl http://localhost:3000/api/search?query=whatis&access_token=324nbkjh3g32423
i checked in the server with "request.fullpath", it return only "/api/search?query=whatis", the second parameter is missing.
however if i do curl like
curl --data="query=whatis&access_token=324nbkjh3g32423" http://localhost:3000/api/search
it is taking all the parameters.
i understand there is a problem with encoding, but i what to know what difference is there with the two requests.
Thanks in advance
The problem probably is that bash shell sees & as the end of the command.
try quoting the entire querystring like this -
curl "http://localhost:3000/api/search?query=whatis&access_token=324nbkjh3g32423"
Let's say I have a string which contains text grabbed from Twitter, as follows:
myString = "I like using #twitter, because I learn so many new things! [line break]
Read my blog: http://www.myblog.com #procrastination"
The tweet is then presented in a view. However, prior to this, I'd like to convert the string so that, in my view:
#twitter links to http://www.twitter.com/twitter
The URL is turned into a link (in which the URL remains the link text)
#procrastination is turned into https://twitter.com/i/#!/search/?q=%23procrastination, in which #procrastination is the link text
I'm sure there must be a gem out there that would allow me to do this, but I can't find one. I have come across twitter-text-rb but I can't quite work out how to apply it to the above. I've done it in PHP using regex and a few other methods, but it got a bit messy!
Thanks in advance for any solutions!
The twitter-text gem has pretty much all the work covered for you. Install it manually (gem install twitter-text, use sudo if needed) or add it to your Gemfile (gem 'twitter-text') if you are using bundler and do bundle install.
Then include the Twitter auto-link library (require 'twitter-text' and include Twitter::Autolink) at the top of your class and call the method auto_link(inputString) with the input string as the parameter and it will give you the auto linked version
Full code:
require 'twitter-text'
include Twitter::Autolink
myString = "I like using #twitter, because I learn so many new things! [line break]
Read my blog: http://www.myblog.com #procrastination"
linkedString = auto_link(myString)
If you output the contents of linkedString, you get the following output:
I like using #<a class="tweet-url username" href="https://twitter.com/twitter" rel="nofollow">twitter</a>, because I learn so many new things! [line break]
Read my blog: http://www.myblog.com <a class="tweet-url hashtag" href="https://twitter.com/#!/search?q=%23procrastination" rel="nofollow" title="#procrastination">#procrastination</a>
Use jQuery Tweet Linkify
A small jQuery plugin that transforms #mention texts into hyperlinks pointing to the actual Twitter profile, #hashtag texts into real hashtag searches, as well as hyperlink texts into actual hyperlinks
Using Rails 2.3 with Ruby 1.8.7
I am working with an SQL Server database on a windows server with collation
SQL_Latin1_General_CP1_CI_AS
When I go to the rails console on the Linux server with the app and query the problem record I get
=> "Rodríguez, César"
To try to isolate the problem in my controller I tried just render :text => with the record's problem field, but on the browser I am seeing
Rodr?guez, C?sar
I believe this is an encoding issue, but I don't know how to
resolve.(and Google + Stackoverflow skills are failing me) Given that the
source data can't be changed, what do I need to do on the rails side
to get the text to render properly?
On Chrome I have tried to manually change the encoding and no matter
which I select I can't get the text to render correctly.
Also, why would it render correctly on the console?
character encoding is by default unicode in firefox and the same is for chrome. Just check if you tried with these.
You need to check and confirm some of the issues like
--Meta tags in the html page. check the charset from the source of file. Change it to utf-8 in the layout and try.
--Database encoding
--Select a character set that contains mappings for all the characters that an application and its users will want to see
There can be better solutions, still give a try using Inkscape command line tool, change the text to image files and then you can display.
Encoding is handled here with no issues currently.
how can I convert html to word
thanks.
I have created a Ruby html to word gem that should help you do just that. You can check it out at https://github.com/nickfrandsen/htmltoword - You simply pass it a html string and it will create a corresponding word docx file.
def show
respond_to do |format|
format.docx do
file = Htmltoword::Document.create params[:docx_html_source], "file_name.docx"
send_file file.path, :disposition => "attachment"
end
end
end
Hope you find it helpful.
I am not aware of any solution which does this, i.e. convert HTML to Word format. If you literally mean that, you will have to parse the HTML document first using something like Nokogiri. If you mean you want to output data persisted in your model objects, there is obviously no need to parse HTML! As far as outputting to Word, I'm afraid it looks as if you will have to directly interface with a running instance of Microsoft Word via OLE!
A quick google search for win32ole ruby word will get you started:
http://blade.nagaokaut.ac.jp/cgi-bin/scat.rb/ruby/ruby-talk/241606
Good luck!
I agree with CodeJoust that it is better to generate a PDF. However, if you really need to generate a Word document then you can do the following:
If your server is a Windows machine, you can install Office in it and use ruby's OLE binding to generate the Word document into the public folder and then deliver the file in the response.
To use ruby's OLE binding, see the "Programming Ruby" ebook that comes with the one-click ruby installer for Windows. You may have to use custom logic to convert from HTML to Word unless you can find a function in the OLE api of Word to do that.
http://prawn.majesticseacreature.com/
You could allow the user to download a PDF or a .html file, but there aren't any helpful ruby libraries to do that. You're better off generating a 'printable and downloadable' version, without much styling, and/or a pdf version using a library like prawn.
You could always generate a simple .rtf file, I think word'll be pretty happy reading that...