HTTP GET on craigslist blocked - ruby-on-rails

I'm trying to do a HTTP GET on craigslist sfbay.craigslist.org. Here is my (ruby) code which is really simple
require 'net/http'
result = Net::HTTP.get(URI.parse('http://sfbay.craigslist.org'))
I end up getting an error "This IP has been automatically blocked."
This behaviour only happens when I try this from Amazon EC2 or on heroku. When I try again on my own computer localhost I get the correct result. Does this have to do with Amazon EC2?
I'm wondering if other people have had the same issue. What can I do to access craigslist from EC2?

I can confirm that Craigslist is blocking from the major Amazon EC2 IP ranges by IP (not by user agent). It works elsewhere, though I suspect any volume would cause other IPs to get blocked.
You could step around it with tor. More significantly, this stackoverflow question discusses data sources used by craigslist mashups.
I even tested a Brazil EC2, assuming they might not have all the CIDRs blocked. No bueno.

Related

Rails EC2 testp2.czar.bielawa.pl

i deployed a rails app on a EC2 instance and on this morning when i clicked on a section of the app redirected to this http://testp2.czar.bielawa.pl/.
i would like to know if this is a malware or what?
because this link is not part of the app.
thanks
Yes its type malware, you might not be the actual target is rather that your server might be used for source of spam, port scanning and DDoS attacks.
There is pretty extensive abuse list for
http://testp2.czar.bielawa.pl/
See here.http://www.abuseipdb.com/report-history/185.25.151.159
For getting rid of this follow the great instructions from serverfault below.
https://serverfault.com/questions/218005/how-do-i-deal-with-a-compromised-server
Alternatively if there is nothing important there just delete the EC2 and start again

Are any HTTP log files saved on my system by default?

I have an application hosted on Amazon EC2 on a Ubuntu machine, written in Ruby (on Rails), deployed with Capistrano and running on Nginx.
Last friday one module of my application has crashed and nobody in the company noticed until this morning. We spent some money with Facebook and Google ads and received a few hundreds of visits, but nobody created an account due to this bug.
I wonder if this configuration is saving the HTTP requests and its bodies somewhere in a log file. But we didnt explicitly set it, so it would only happen if any of these technologies do it by default.
Do you know whether there is such log or not?
Nope, that wouldn't be anywhere in a usable form (I'm inferring you want to try to create the accounts from request bodies in log files). You'll have the requests themselves in your nginx logs, and the rails logs will contain more info about the request, but as a matter of security, by default, any sensitive information (e.g. passwords) would be scrubbed from them. You may still be able to get some info from them.
To answer your question a little more specifically, the usual place for these logs on your system would be:
/var/log/nginx/
/path/to/your/rails/app/log/production.log
On a separate note, I would recommend looking into an error reporting service like Honeybadger, Airbrake, Raygun, Appsignal, or others so that you don't have silent failures like this moving forward.

Use proxy or Tor within Heroku rails app to hide IP

I'm using Mechanize inside a rake task that is run by a scheduler add-on to my ruby app on Heroku. In the script, I am logging into a webpage which worked until recently when the script could no longer log-in. When I began debugging, Mechanize shows different form fields when I run the script in the heroku console than on my local console.
Local ruby console shows these fields:
>> asf.fields.each do |f| puts f.name end
__VIEWSTATE
__PREVIOUSPAGE
__EVENTVALIDATION
login$field
password$field
Heroku console shows one additional field that does NOT appear in the html source:
>> asf.fields.each do |f| puts f.name end
__VIEWSTATE
__PREVIOUSPAGE
__EVENTVALIDATION
login$field
password$field
captcha$txtCaptcha
When I issue:
>> asf.click_button
Update:
I tried changing the user agent to several different browser aliases with no luck. It appears that the IP address from Heroku is causing the captcha to be served up. Would it be possible to make a request through a proxy server or use Tor to keep the IP from being exposed?
Answer to your question is yes, you can proxy through tor. I've done it in the past, issues you will face:
You'll have to run tor somewhere else if your running on heroku
Tor is pretty slow for scraping
You'll need to setup a proxy that can speak to tor (privoxy)
For any serious scraping you'll need to have multiple tors running
Even your tor ips will get blocked after a while.
Makes you think if it's worth the hassle. You can pay for ip masking proxy services which might be an easier way to go.
Think link got me some of the way when I was looking into this: http://www.howtoforge.com/ultimate-security-proxy-with-tor

emails sent from production server end up in spam

I use sendmail to send emails from my application. I always send the emails from SOME_NAME#MY_DOMAIN.com but they always endup in spam folder.
I know that I should do some stuff on the DNS side to make my emails be marked as non-spam, but I don't know what they are.
I am a newbie and this is my first time setting up a production server, a domain and everything else myself. I appreciate if someone helps me.
What sort of environment are you deploying to?
This frequently happens to applications deployed to cloud services, like Amazon or RackSpace. Their entire IP blocks are registered as spam at services like Spamhaus, which is a sensible precaution or else we'd be getting even more spam than usual. You should enter your server's IP address in that field to see if you're listed as a spammer.
If you are, you can request to Spamhaus that the block be lifted. Getting in touch with Amazon's support stuff also helps. Finally, you can get around the issue entirely by using a sendmail service of some sort -- Amazon SES is pretty good, and there's even a Gem out there that provides integration to Rails apps.

Rails/Passenger/Apache: Simple one-off URL redirect to catch stale DNS after server move

One of my rails apps (using passenger and apache) is changing server hosts. I've got the app running on both servers (the new one in testing) and the DNS TTL to 5 minutes. I've been told (and experienced something like this myself) by a colleague that sometimes DNS resolvers slightly ignore the TTL and may have the old IP cached for some time after I update DNS to the new server.
So, after I've thrown the switch on DNS, what I'd like to do is hack the old server to issue a forced redirect to the IP address of the new server for all visitors. Obviously I can do a number of redirects (301, 302) in either Apache or the app itself. I'd like to avoid the app method since I don't want to do a checkin and deploy of code just for this one instance so I was thinking a basic http url redirect would work. Buuttt, there are SEO implications should google visit the old site etc. etc.
How best to achieve the re-direct whilst maintaining search engine niceness?
I guess the question is - where would you redirect to? If you are redirecting to the domain name, the browser (or bot) would just get the same old IP address and end up in a redirect loop.
If you redirect to an IP address.. well, that's not going to look very user friendly in someone's browser.
Personally, I wouldn't do anything. There may be some short period where bots get errors trying to access your site, but it should all work itself out in a couple days without any "SEO damage"
One solution might be to use Mod_Proxy instead of a rewrite to proxy traffic to the new host. This way you shouldn't see any "SEO damage".
I used rinetd to redirect the IP traffic from the old server to the new one on IP level. No web server or virtual hosts config needed. Runs very smoothly and absolutely transparent to any client.

Resources