FAST ESP vs Google Search Appliance for development - comparison

Which of the two provides a better API for developing on top of?
Although there is a virtual Google Search Appliance available for download, no such equivalent is present for FAST.
So looking to developers with experience in either of these products to give suggestions and links to documentation. (especially for FAST as there's none available on their site)
Kind regards,

I'm pretty sure that FAST does not provide a trial download of their Enteprise Search Platform (ESP) today nor it's SDK (which is useless without ESP).
FAST is pretty much the industry leader for customization (Google is popular as simple out of the box solution and Autonomy seems to be the leader in compliance) which is what you are likely intrested in an API for. But not Cheap. Internal Python customization for processing documents, exteral .NET & Java API for interacting with the service.
Also, you if you are looking for a basic Enteprise Search + API, google on "Solr" project.

I think FAST provides a free trail version. Along with it comes the API documentation and other manuals. My company uses it. I use it.
Answering your queston, FAST is obviously better than Google search appliance (for various reasons). That's my view.
Freddie

I have worked on Google Search Appliance and it works great.
I can search in meta-data, get selective data back from query, see real time status of documents that are getting crawled, scalable with GSA6.14 and all great support from Google.

Apache Solr is a great solution with a very flexible client API. You should definitely check it out. We are moving from FAST to Solr currently & I find the features and API of Solr much better than FAST ESP.

Related

GIS library for ASP.net MVC

I am trying to develop an ASP.net MVC website in which I need to show a map (whole world) with several markers and additional information for every marker.
Does anyone know a good library that would support that and which (if possible) also lets me use "offline" maps stored on my own server (open street maps for example).
It will be an intranet application which means in case of google/bing yearly license costs would have to be paid. The customer doesn't want that, but in general the library can be commercial (one-time per server and/or developer fee).
I already had a look at "ThinkGeo Map Suite", any other suggestions or recommendations.
SharpMap, it's flexible and easy to implement, can use with shapes or from spacial db.
for details... http://sharpmap.codeplex.com/
there is a couple examples who does exactly what you need, so can start from these
exist other libraries but I don't try it, research for another options.
Answer
Manifold is a very cheap system which has an basic internet map server framework:
http://www.manifold.net/info/ims.shtml
You already suggested ThinkGeo, I would put it in the same basket as Manifold. Be sure to evaluate performance and limitations with both packages.
Discussion
You'd be hard pressed to find a pure NET library for mapping that works well and won't blow out your budget (see ESRI). Depending on your skill level and your knowledge of GIS systems, I would suggest creating your own web map server and just embed it in your web application.
Some good environments for this which I can recommend are:
MapServer
GeoServer
As far as displaying and interacting with the map, there are several web based platforms available:
GIS SDKs For Web Apps

Image/File hosting storage best practices and standards

We are building an image and file hosting website and we will save these files on our servers, so I want to know if there are any best practices or standards I need to read and follow to make our website scalable and easy to extend in the future.
Is there a book or articles or videos talking about this subject, please share.
As per my experience to deal with large data.
its always best to opt for Cloud, check for "Amazon S3" (Amazon AWS) or Windows Azure.
features like "CDN" (cloud front) is a big plus.
I believe this is not a simple question that can be answered without knowing
how many files are expected ?
how many users/files accesses per hour/day/minute ?
your usage scenarios with this files (downloading? streaming? how many concurrent files downloaded at once?
are you stuck in one particular OS (windows) and filesystem (NTFS), or is there freedom in this ?
My personal note : Building own image/file hosting is not a trivial task, i strongly recommend you to hire somebody with experience from this area.
I would recommend that if possible, you look at a 3rd party solution that provides an api. you'll then get the benefits of lower cost of ownership, no maintenance costs for the hardware and continual updates thrown in for free when the 3rd party adds new features to the core offering. I know this from 1st hand experience as we scoped out the options for doing this in a recent project and came to the conclusion that we'd spend 100 times more on our own solution and even then, may not get it right. We opted for a company called Razuna who offer both a hosted and open source version of their platform. Their api is very straightfwd and can be consumed inside your mvc app with potentially only a few days effort (depending on your use case). The beauty of this approach is that the hosted elements are actually on the nirvanix backbone and are served via their CDN - so win win.
You can get the details at:
http://www.razuna.com
and can view the api docs at:
http://wiki.razuna.com/display/ecp/Developer+Guides
Good luck and if you need any further real-life guidence on this, feel free to come back. Oh and btw, we were also able to ask for 'paid for' features to be added to the core offering at pretty much standard market day rates.

Open Alternatives to Google Prediction API

A recent announcement by Google about the Google Prediction API sounded very interesting. It could be useful for a project that is coming up, and would probably do a better job than some custom code I was considering.
However, there is some vendor lock-in. Google retain the trained model, and could later choose to overcharge me for it. It occurred to me that there are probably open-source equivalents, if I was willing to host the training myself (I am) and live without their ability to throw hardware at the problem at a moment's notice.
Last time I looked at 3rd Party computer training code was many years ago, and there were a lot of details that needed to be carefully considered and customised for your project. Google appear to have hidden those decisions, and take care of them for you. To me, this is still indistinguishable from magic, but I would like to hear whether others can do the same.
So my question is:
What alternatives to Google Prediction API exist which:
categorise data with supervised machine learning,
can be easily configured (or don't need configuration) for different kinds and scales of data-sets?
are open-source and self-hosted (or at the very least, provide you with a royalty free use of your model, without a dependence on a third party)
Maybe Apache Mahout?
PredictionIO is an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery.
Have been looking recently at tools like google prediction API, one of the first ones I got put on to was Weka machine learning tool which could be worth checking out for anyone looking.
I'm not sure if it's relevant, but directededge seams to be doing exactly that :)
There is good free for use service Yandex Predictor with 100000/day request quota. It works for text only, supports several languages and spell correction.
You need to get free API Key, then you can use simple RESTful API. Api support JSON, XML and JSONP as output.
Unfortunately I cannot find documentation in English. You can use Google Translate.
I can translate docs if there is some demand.

Anyone implemented Endeca with .NET? Would you recommend Endeca or FAST?

Which search engine would you recommend for a Commerce website?
We have millions of products in a catalog and we want it to be as quick as possible.
We would also want to make sure that the marketing driven through the search engine will be fast and effective.
What are your opinions?
This is only half the answer to your question. I've used it with Java and not .NET. Fast is said to be the better search engine. I don't know. However for Commerce Endeca is considered to be the best. I've used it with a catalog of 5Mil. products and queries are very very fast.
If you use .NET or Java does not matter in the end solution the Search Engine stays the same.
And what search engine to be used is not answered easily. it all depends on what you want/can spend. My experiences with Endeca are very positive.
We've been using Endeca for several .NET ecommerce website, surely I think it give us faster full text search with little coding in compare with SQL Server, but Endeca is over complexity, it cost us lots of time to update and configure. Its query capability is quite limited, it lacks of flexibility as we get used to with SQL query.
I'm going to reduce Endeca dependency by utilize Lucene.Net for search part.
I have been involved in several .NET implementations of Endeca and have been happy every time. The biggest advantage of Endeca over FAST is the cost and time of implementation.
My recommendation is to document your requirements and send out an RFP. Make sure you include the following as part of the RFP:
A demo of the proposed solution (make
sure they clearly explain within the
Demo what features are included in
the cost of the proposal and what
features cost extra).
Examples of existing customers that
have implemented this solution on top
of the same commerce software you are
using.
Software Licensing cost (you will
need to provide details about the
number of records you have in your
commerce catalog as both companies
price based on this)
A detailed list of available modules
/ plugins and their respective costs.
Implementation cost.
Implementation schedule.
Hope this helps.
Endeca is the best commercial product in my own honest opinion. We've been using it for our millions of catalogs data.
Or you can try Lucene.NET
One thing to consider before buying Endeca is that Oracle licenses the product by physical CPU present in a server. So if you were considering virtualizing Endeca servers into a VCE or other blade virtualization server, you would have to pay for licenses for all of the CPUs blades in the appliance, even if you were only utilizing one of them for Endeca. This makes Endeca only suitable for physical server installations, strictly because of Oracle licensing issues.

What is the best Delphi n-tier low bandwidth technology?

I need to deploy a Delphi app in an environment that needs centralized data and file storage system (for document imaging) but has multiple branch offices with relatively poor inter connectivity. I believe a 3 tier database application is the best way to go so I can provide a rich desktop experience with relatively light-weight data transfer needs. So far I have looked briefly at Delphi Datasnap, kbmMW and Remobjects SDK. It seems that kbmMW and Remobjects SDK use the least bandwidth. Does anyone have any experience in deploying any of these technologies in a challenging environments with a significant number of users (I need to support 700+)? Thanks!
Depends if you are tied to remote datasets. If you aren't dataset bound then SOAP would likely be a good choice. Or, what I've done is write my own protocol that is similar to SOAP in nature. This was done before SOAP was standard and I'm glad I did - this gives you the ability to control more of the flow of data. It's given that if you have poor connectivity then you will be spending time supporting it. It's very nice if it's your own code you are supporting versus having to wait on a vendor. (Although KBM and REM are known to be pretty good vendors.)
Personal note: 700 users in a document imaging application over poor connectivity sounds like a mess. Spend the money on upgrading connectivity as it'll be cheaper in the long run.
Both kbmMW and RO SDK offer binary format, which is more compact than SOAP format,specially you are working with documents.
RO sdk seems to offer more GUI tools to help you doing your services.
Also give a RealThinClient SDK a look, it's a lightweight remoting framework.
But what ever framework you go with, your design of work will make it fast or slow, I have some applications working on slow 128kb lines, and it's working perfect without any user complain, but I don't do a large transfer for files.
One thing to remember...its not the number of users, but the number of them using the resources at the same time that will be the issue. Attempt to develop your application "server stateless" if at all possible, this will allow greater flexibility in the long term if you find you have to add more servers to the pool to support your customer base. The hardest thing about n-tier is scaling beyond the first server...plan on that from the start. Each request should not know anything about a prior request...or at the very least the request should have a way of passing the context so the server can look it up in a session table or something.
Personally, I would recommend RemObjects. I have used it with good results.
I don't know if it's the very best / most efficient (glad you asked this question!), but I've had good results w/RemObjects SDK + DataAbstract. The latter made much of the plumbing details less involved, which was helpful. Still implementing, but so far so good.
If you really wanna go "low-bandwidth" use BSD Sockets API - that'll give you full control over what's being sent and there you can send as little information as you want. Of course then you'll have to implement all the tiers yourself, but hey - that's still an option :D

Resources