Sitemap_generator produces sitemap at unknown URL - ruby-on-rails

I'm using the sitemap_generator gem to produce sitemaps for my site.
Producing a sitemap locally works fine, and I can access the sitemap at http://localhost:3000/sitemap1.xml.gz.
After deploying to Heroku and running
heroku run rake sitemap:refresh --app myapp-prod
I get this:
In /app/public/
+ sitemap1.xml.gz 254 links / 4.74 KB
+ sitemap_index.xml.gz 1 sitemaps / 231 Bytes
Sitemap stats: 254 links / 1 sitemaps / 0m06s
So far so good - however, when trying to access my sitemap at https://myapp.com/sitemap1.xml.gz, I get a 404 error. I've tried the following ways to resolve this but none have worked:
Call git add for the two locally generated xml files, push them to Heroku, and call heroku run rake sitemap:refresh --app myapp-prod to update the locally generated URLs with my production URLs. However the file is not being refreshed, it stays exactly the same as generated locally, even though the same message as above is being returned.
Producing the sitemap into a custom path, e.g. public/shared/. But the error persists when accessing https://myapp.com/shared/sitemap1.xml.gz.
All possible and impossible URL combinations, like https://myapp.com/public/sitemap1.xml.gz (which of course were never going to work, but wanted to leave no stone unturned)
Any ideas as to what could cause this behaviour, and where the sitemap might be stored?

After some further research I finally figured out what the issue was.
Since Heroku uses a read-only filesystem, the sitemaps can't be generated to the public directory. This is explained in more detail here:
https://devcenter.heroku.com/articles/dynos#ephemeral-filesystem
The solution that worked for me in the end was generating the sitemaps into my Amazon S3 storage, as described here.

What did you do about uploading your sitemap to Google Web Master tools because by default it wants the sitemap to live at http://www.yoursite.com/.....
I have my sitemap hosted in Amazon S3 and it's referenced in my robots.txt file but I need to tell Google Web Master Tools where to find it.
Any advice?

Related

Sitemap generation does not save file to storage

I'm just getting a sitemap going with the rails gem and am having trouble generating a sitemap in production.
Running the rake command: rake sitemap:refresh in development creates the sitemap.xml.gz file in the public folder. I navigate to localhost:3000/sitemap.xml.gz and get it downloads the zipped file.
When I run it in production (Heroku-like command line with Dokku on a Digital Ocean VM) I get:
+ sitemap.xml.gz 6 links / 450 Bytes
Sitemap stats: 6 links / 1 sitemaps / 0m00s
Pinging with URL 'https://www.myapp.com/sitemap.xml.gz':
Successful ping of Google
Successful ping of Bing
It appears the file has been created, so I navigate to www.myapp.com/sitemap.xml.gz and get a 404 response.
Server say:
ActionController::RoutingError (No route matches [GET] "/sitemap.xml.gz"):
It appears that this request is hitting the Rails stack when it should be served by Nginx. I just checked to see if the file exists:
FileTest.exists?("public/sitemap.xml.gz")
It returns false so it seems like the sitemap is not actually saved on file. Is there a possibility my file system is read-only right now? How could I test that?
With the new dokku docker-options plugin, you could append persistent storage/volume from your host machine into you container.
First create a local directory in your host machine.
mkdir <path/to/dir>
Then add the following docker-options in dokku
dokku docker-options:add <app> deploy,run -v <path/to/host/dir>:<path/to/container/public/sub/dir>:rw
On your config/sitemap.rb file, add the following lines
SitemapGenerator::Sitemap.public_path = 'public/sitemap/'
SitemapGenerator::Sitemap.sitemaps_path = 'sitemap/'
The sitemap:refresh rake task should write into the sitemap sub folder within the public folder.
This would also allow sitemap_generator to ping the search engine with the right address to your sitemap.xml.gz file.
Feel free to give this a try.
I believe this is a dokku related "issue". Dokku uses Heroku buildpacks, and this yields a read-only file system like on Heroku.
I'd be curious to know if there's a way to modify that behavior in Dokku (seems unlikely if using Heroku buildpacks), but that's a bit out of my league.
I think the best solution to this problem is the same as on Heroku - using Amazon S3.
The sitemap_generator gem has docs on how to set this up.

Pushing RoR app to Heroku, files in /public are missing

Let me start by saying I am using Ruby 2.0.0 and Rails 4.1.1
I've worked my way through the Treehouse basic RoR course; ending in a very basic version of Twitter. I have the application running just fine on my local install, but when I pushed it to Heroku it seems to be missing the files in the /public directory; namely the /assets css and javascript.
I've precompiled my assets as instructed, and verified that they area indeed showing up on my GitHub remote that is using the same branch. I was told that Heroku will not compile your assets for you.
All my routes and HTML is displaying just fine, but I cannot pull any of the files that live in the /public directory (404.html, 500.html, etc)
It feels to me like it is a permissions issue or something with the /public directory, but I haven't found a way to actually browse what files are on my Heroku instance. I've tried re-pushing several times while making small changes, and the css/js never seems to appear.
In case that you have already set:
config.serve_static_assets = true
in your config/environments/production.rb
And still not working, you can actually see the logs from your heroku app using heroku logs or heroku logs -n NUMBER_OF_DESIRED_LINES_HERE in your terminal.

rails app - sudden 403 after pull - how do I start to debug?

I'm been working on a rails 3.1 app with one other dev.
I've just pulled some of his recent changes, using git. And am now getting a 403 on any page I try to visit.
You don't have permission to access / on this server.
I'm running the site locally through passenger.
Oddly, when I start the app using rails' internal server. I can visit the site at http://0.0.0.0:3000
Looking at the changes in this recent pull, the only files have changed are some javascripts, some html the application.rb, routes.rb and a rake file.
How do I debug this, I'm a bit lost on where to start?
EDIT:
If I roll back to an earlier version the site works, through passenger. Which leads me to believe the problem is within the rails app, rather than an Apache error. Or it could be a permissions thing, can git change file permissions in this way?
IMHO this is a configuration error in Apache or wrong directory layouts. Make sure that the passenger_base_uri still points to the public folder inside your rails project and that there are no hidden .htaccess files which block access. Also verify that your sym-links are correct (if there are any). Also check your Apache error log.
Start by launching your console to see if rails and your app can be loaded.
In your application root directory type :
rails console

Using S3/CloudFront with Rails 3 Assets and Less CSS

This one is a mouthful! Basically I'm trying to send all of my Rails 3 assets up to the S3 Cloud and use CloudFront as the CDN to deliver it all. I already learned about configuring Rails to pull from an asset server in production mode. The problem I'm running into is finding a good way to automatically package and send everything to the cloud from a rake command or rails gem. The other problem I have is I don't know if using Less CSS with the More gem is going to screw this up. More generates a CSS file from another directory and places it in public/stylesheets. Any ideas or suggestions are much appreciated! Thanks :)
If you are pushing to Heroku and are using the Rails 3.1 assets you are all set.
In the CloudFront configuration on amazon create your distribution and set the origin to your applications URL.
Then in your production.rb file add:
config.action_controller.asset_host = "xxxxxxxxx.cloudfront.net"
The host is the host of your CloudFront distribution.
Then when you deploy make sure you are on the Cedar stack and that assets are being compiled. This will add a unique MD5 into the filenames. When a request is made to your CDN (handled automatically by the setting in your production.rb file), then the CDN will either serve up it's version of the file or pull it from the origin first. This means you don't have to push files up to the CDN, they are pulled in automatically.
If you have a file that doesn't have a unique name for some reason, then you will need to look at how to invalidate the cache in CloudFront, but other than that it's pretty easy.

Rails + Heroku + Jammit

I'm working to install Jammit on my Rails 3 app and then to deploy to Heroku.
I installed the Jammit Gem, and configured assets.yml just fine, it works on dev. But when I pushed to heroku, the files were 404'ing.
Jammit's Usage instructions say: "You can easily use Jammit within your Rakefile, and other scripts:
require 'jammit'
Jammit.package!
I'm not following where/how that works. Running Jammit in my sites command like on the Mac yields a command not found.
Any Jammit users able to help me understand how to move to production with Jammit?
Thanks
I'm using jammit on a Rails 3.0.7 app, on Heroku
gem "jammit", :git => "git://github.com/documentcloud/jammit.git"
I have this in a rake file, to package up the assets before I commit/deploy
desc 'jammit'
task :jam => :environment do
require 'jammit'
Jammit.package!
end
And this in .git/hooks/pre-commit so it is done automatically
echo "jamming it"
rake jam
git add public/assets/*
git add public/javascripts/*
By default, the expires time on Heroku was only 12hrs, to increase it (because I have a cache-busting scheme that I am confident in) I put this in config/initializers/heroku.rb
module Heroku
class StaticAssetsMiddleware
def cache_static_asset(reply)
return reply unless can_cache?(reply)
status, headers, response = reply
headers["Expires"] = CGI.rfc1123_date(11.months.from_now)
build_new_reply(status, headers, response)
end
end
end
To decrease the load on my Heroku Rails server, I am also using a free account at CloudFlare which provides a lightweight, reverse-proxy/cdn, with some decent analytics and security features.
When I get around to caching common content, this thing is really gonna scream!
You could, as I do, use jammit force to pack your assets, upload everything to s3 and define an asset host(s) in rails. This has the added advantage of keeping your slug smaller and more responsive as you can add your public directory to .slugignore .
Alternatively you'll need to work out how to make the heroku version work due to the read only file system.
You can also use a git pre-commit hook to ensure your assets are packaged prior to pushing to heroku (or any server). See https://gist.github.com/862102 for an example. You can copy that file to .git/hooks/pre-commit in your project directory.
this one is the solution
https://github.com/kylejginavan/heroku_jammit
Heroku has a read-only file system, so Jammit is unable to actually store compressed and minified CSS/JS files.
Here is a very good article on the challenge of asset packaging on heroku: http://jimmycuadra.com/posts/the-challenge-of-asset-packaging-on-heroku

Resources