Integrating GCS on a staging Jekyll website - search-engine

We currently have a staging website, which has an IP address like xx.xx.xxx.xxx and we would like to have integrated and tested GCS on it before pushing it live. Is it possible?
Otherwise, is there any alternative to GCS to add a search bar in a Jekyll blog without using plugin?
Thanks!

PART I: Google
Google custom search cannot index an application that isn't available to the internet.
No, that's not entirely true. You can arrange something with Google (in theory, never done it) but it doesn't look easy. Or cheap.
You could set up a custom search for an unrelated site and embed those results in your local page, if you want to test out CSS prior to launch.
Remember, Google Custom Search also comes with ads, unless you pay. And the results tend to look like they came from Google.
PART II: Alternatives
I've looked into this extensively and I haven't come up with a good answer. Here are some not-so-good answers:
1) Tapir Search. This actually worked pretty well, but appears to have died. They do have recent twitter activity, however, so maybe worth checking back in a bit. twitter. It's basically a (free) front end for an elasticsearch server. I think. Neat service, obviously not super-dependable.
2) Go javascript. Lunr for example. There are many, many similar solutions available. Sadly, they are client-side and doing a full-text search on even a smallish blog type site can be very slow. Works okay if you limit the search to titles, but then...you're only searching titles.
3) Build a search engine server. Maybe some breed of Lucene. Upside: very robust search while keeping the snappy response of a flat HTML site. Downside: building and maintaining a search engine server is difficult, expensive and probably overkill.
4) Hosted search engine. Algolia for example. They're basically doing 3) for you. Relatively expensive (~$50/month) but well worth the cost, because, seriously, search engine servers are finicky and prone to explosions. I've never gone this route with Jekyll because I've never had a Jekyll project I was quite that serious about, but I did consider it.
If anyone has anything to add, I'd love to hear it. This question has been irritating me for a while.

Related

HOW: Static company website with Rails

I have a really stupid question in my mind.
I have used Wordpress to create a website for long time, but I dont want to use it anymore. And now I am looking for little bit different approach. Otherwise, I am quite new in Ruby on Rails. I have read some books and I am not feeling in this matter so confident. So, here is the deal:
My friend asked me to create a simple website for his company. He wants only super simple static website which will contains these pages:
Home
Products
Contact
Each page will contain simple information and there is no need to implement contact forms and other basic functionalities. I also want to deploy this app on Heroku, because he has not a lot of money and we are looking for free hosting. Moreover, I think that the best approach in this matter will be some kind of CMS which will help him to edit the website.
The overview of final solution:
Static webpages with simple CMS
Using twitter bootstrap for basic layout
Deploy on Heroku
I appreciate every contribution in this matter.
Thank you
Everything you have said suggests that you should stick with WordPress. It's perfectly capable of presenting a non-blog static website (use Pages instead of Posts) and there are some excellent themes available. WP has, over the years really become a CMS that's also good for blogging. There are other tools like Drupal that may be appropriate.
I set up a WP site with almost exactly the same goals for some very non-technical people; with a little training they eventually learned how to manage the site, upload images, add content, grant permissions to others, and do a lot of other pretty cool stuff. I have been using Rails since 2007, but for that case, it was not the right solution.
Rails is a very (very!) sophisticated web development environment used to build complex and scalable dynamic websites. With power comes a level of complexity several orders of magnitude higher than WordPress. Even if you use refinerycms you still need to do a lot of complicated setup, and need to know a lot of stuff. Even if you're using Heroku and following a RailsCast like the one for refinerycms, you'll undoubtedly hit some wall where you really need to understand more ... Rails is alluring this way -- seems simple.
If you are using this a reason to learn Rails, and are willing to invest some time, then by all means go for it. But if you want a simple solution, it's not the way. Learning Rails is like learning to fly a plane, but harder.
For static pages with rails, you can use High Voltage gem. You can find the detailed usage of this gem in this blog post . Once you create the pages, then you can easily deploy your app as like normal rails app in the heroku.

What is the best conceptual approach, in Rails, to managing content areas in what is otherwise a web application?

(A while back I read this great post: http://aaronlongwell.com/2009/06/the-ruby-on-rails-cms-dilemma.html, discussing the "Rails CMS Dilemma". It describes conceptual approaches to managing content in websites vs web apps. I'm still a beginner with Rails, but had a bit of a PHP background, and I still have trouble wrapping my brain around this.
A lot of what I run into is customers who want a website that is not 100% website, and not 100% web app... That is, perhaps there are several pages of business-to-public facing content, but then there are application elements, and the whole overall look is supposed to be cohesive. This was always fairly simple in PHP, as you just kind of dropped your app code into the PHP "script", etc (though I know there are plenty of cons to this platform and approach).
So I am wondering, what is the best approach in Rails for doing this?
Say you have an application with user authentication and some sort of CRUD stuff going on, where users collaborate on projects or something. Well, what is the optimal approach for managing the text/images of the "How This Site Works" and "Our Company" pages, which people may also want to view? Is it just simply having a pages controller and several text fields, with an admin panel on the back end that lets you edit those fields? Or is it perhaps a common approach to start off with something like Refinery, and then build on top of it for the non-content-driven areas of a site?
Sorry if this is a dumb question. It's just that I've read Hartl's book and others, and they never address this practical low-level stuff for a beginner... Sure, I can build a Twitter feed now, but what Twitter's "About" page (http://twitter.com/about)? I can't just throw text into a view and give that to a client... They want a super easy way to see the site tree, edit content areas, AND administrate/run their Twitter feed or whatever.
Thanks for your help.
I think you're looking for a CMS that runs as a plugin in your Rails application. If that's the case, I'd suggest that you try http://github.com/twg/comfortable-mexican-sofa

Best search option for a heroku-hosted Rails app?

I've been working on a new project lately where a fantastic search engine is crucial. It's a rails3 app hosted on heroku and I'm looking into possible solutions(a rubygem would be ideal) which offer a easy way to have powerful full-text search.
Right now, I'm using acts_as_tsearch which leverages PostgreSQL and performs a basic MATCH query. Though, it's not really pulling back good results(for example, if I search for "create a project" and "how do i create a project" exists as a query, it doesn't find it).
Can anyone share their experiences with full text search, anyone tried out Solr ?
IndexTank is your best bet. They were recently added as a Heroku add-on.
We recently tried to just run our own search for our Heroku app and it's just not worth it because you have to worry about stability and scaling of that search box. It's better to go with a provider, like IndexTank.
IndexTank also powers Reddit and Wordpress.com, so can bet it'll be reliable.
SOLR works very nicely -- it's a bit pricey to get starts ($20 a month), but it just works, and works well.
They recently added the ability to ask the user "Did you mean to search for [correct spelling]".
You can easily cross-model search (search for Users and Cars and Dealerships).
Heroku offers addons which you can easily add to your application. You should take a look at Solr and IndexTank.
There's a free solution in the Texticle gem. It uses PostgreSQL's (> 8.3) full text index support and creates a search method on your models. If you create indexes, the speed is very good (for a free solution).
Hope that helps!

Best way to add full web search to my site?

I need to add full web search to my site. I need something like Google Custom Search but with no ads and it has to be free. Any recommendation of a web service or open source project that can index my site and allow me to search it will be helpful.
My site is made in ruby on rails, if that helps.
I'll make this question community-wiki so you can edit my bad English. I think many people can benefit from this question.
Check out Lucene. It's an open source search engine that will certainly be a fun learning experience to implement on your own site. It was originally designed by the Excite folks, I do believe.
Ferret is the Ruby port of Lucene. Check out the acts_as_ferret plugin.
Depends what you mean by full web search really. If you want to search the whole web then the answers above wont help you much as they are really for indexing and searching the content of your site. I would suggest using the Google ajax search (just a 'powered by google' needed, no ads) or Boss from yahoo (might require ads not sure).
http://code.google.com/apis/ajaxsearch/
http://developer.yahoo.com/search/boss/
People are going to acts_as_solr and thinking sphinx in the blogs i read:
http://acts-as-solr.rubyforge.org/
http://ts.freelancing-gods.com/
I've aslo been looking at tsearch in postgres, it looks very capable:
http://www.sai.msu.su/~megera/postgres/gist/tsearch/V2/
What do you mean by "full web search"?
The are good answers available for full-text search where a search engine indexes and queries the model objects stored in your database.
If you mean something that indexes and queries your rendered HTML, Nutch is a popular option with a web-crawler, parser, indexer, and query interface.
I recommend acts_as_xapian. It's very easy to implement, it's fast enough, and it's the got the features you'll normally need.

Where do search engines start crawling?

What do search engine bots use as a starting point? Is it DNS look-up or do they start with some fixed list of well-know sites? Any guesses or suggestions?
Your question can be interpreted in two ways:
Are you asking where search engines start their crawl from in general, or where they start to crawl a particular site?
I don't know how the big players work; but if you were to make your own search engine you'd probably seed it with popular portal sites. DMOZ.org seems to be a popular starting point. Since the big players have so much more data than we do they probably start their crawls from a variety of places.
If you're asking where a SE starts to crawl your particular site, it probably has a lot to do with which of your pages are the most popular. I imagine that if you have one super popular page that lots of other sites link to, then that would be the page that SEs starts will enter from because there are so many more entry points from other sites.
Note that I am not in SEO or anything; I just studied bot and SE traffic for a while for a project I was working on.
You can submit your site to search engines using their site submission forms - this will get you into their system. When you actually get crawled after that is impossible to say - from experience it's usually about a week or so for an initial crawl (homepage, couple of other pages 1-link deep from there). You can increase how many of your pages get crawled and indexed using clear semantic link structure and submitting a sitemap - these allow you to list all of your pages, and weight them relative to one another, which helps the search engines understand how important you view each part of site relative to the others.
If your site is linked from other crawled websites, then your site will also be crawled, starting with the page linked, and eventually spreading to the rest of your site. This can take a long time, and depends on the crawl frequency of the linking sites, so the url submission is the quickest way to let google know about you!
One tool I can't recommend highly enough is the Google Webmaster Tool. It allows you to see how often you've been crawled, any errors the googlebot has stumbled across (broken links, etc) and has a host of other useful tools in there.
In principle they start with nothing. Only when somebody explicitly tells them to include their website they can start crawling this site and use the links on that site to search more.
However, in practice the creator(s) of a search engine will put in some arbitrary sites they can think of. For example, their own blogs or the sites they have in their bookmarks.
In theory one could also just pick some random adresses and see if there is a website there. I doubt anyone does this though; the above method will work just fine and does not require extra coding just to bootstrap the search engine.

Resources