Bulk insert/update Vespa documents via HTTP POST and PUT method - yql

My script will generate a list of document to be inserted into Vespa. Current, I'm using HTTP POST/PUT to insert/update one document per request. This can be slow if I need to insert a lot of documents.
Hence, I'd like to know if there's more efficient method to complete this kind of job. Thank you.

https://docs.vespa.ai/documentation/vespa-http-client.html for high feed throughput. There is no direct http api for doing bulk http post/put.

Related

Postman how to use Get Requests by read URLs from various .txt files

I hopefully have a simple request but unable to do by myself due Postman's
file support behaviour.
Case:
Lots of people creating simple URLs for a maschine learning tool and pushing them into a database.
The URLs got just simple differencies of an count up ID Number to request the appropriate information.
However, at the end of the day we got lots of simple single textfiles each
with 1 single url line.
What I just wanna do is to push the whole folder into Postman to finally test all created URLs and save the rsult as json...hm
Postman does not support textfiles and yeah thats crackbrained but I dunno how?
Any idea is welcome?
Thanks a lot in advance
brgds
You can export a Postman's Collection and see how the requests in it are exported (The JSON Format of a request is to be noted).
Now you know the format of a request, you can create a script that'll just run through all your files and basically generate a request out of each URL and add that to the exported collection's JSON.
Finally, import the collection back into Postman and you'll have all the requests ready to be tested out.

Send large size data with Rails

I need to send to the customers a raw emails via my rails app.
When they click a link, a new page must open and they need to be able to see the source code of an email. I have a lot of cases where there are emails really big (even 40/50 mb), and it takes a lot of time to server to send it.
E.G.
I have an email with 3 attachments, the total size is 30mb. My
controller method it takes 700 ms to process it and to retrieve the
raw source from imap server, but in the broswer, it takes up to 5
seconds. (2.5 to the first byte, 2.5 to download it).
Right now I just send the string with the render method. Is there a better way? where I am losing all that time?
To be more clear:
With the word 'send' I mean when the server has to 'send' the source code to the browser so the suer can visualize it
What about storing attachments on your server and include in email only links to those attachments? This way your emails will be blazingly fast and clients may download attachments separately. If you don't want to store files on your server, you may use Amazon S3 or some other cloud storage (there are many in these days).
To ease file uploading, I'd recommend you carrierwave library, Amazon S3 integration goes from the box.
I would suggest you to use Delayed Job: https://github.com/collectiveidea/delayed_job or Sidkiq: https://github.com/mperham/sidekiq
When the job is in processing state, you can mark that email as Sending.... and once the background job is completed you can mark it as Sent
Hope this helps
I think you could benefit from looking into solutions for HTTP streaming, which would allow you to start sending data while still processing the request.
This is difficult with Rack based servers, so Rails might have some issues with this approach.
Another approach is to try and split the raw source into chunks and request each chunk using Ajax.
This will allow your app to be more responsive and offer a better user experience. This is also known as a perceived performance approach, since the user experiences the app as more responsive even if it takes the same amount of time to load.
If I were trying to resolve the issue, I would look into an Ajax solution that would allow me to leverage IMAP's partial fetch feature, referenced in it's RFC.
It's possible to write a simple server side API that fetches a part of an email or returns a signal when there is no more to download and than use Javascript to request the data from the server until the 'no more data' return value is received.
This would allow you to display the downloaded data as it's being received.

webdav search returns 405

I am building a client application for webdav. I have implemented webdav protocols like MKCOl, delete, prop find, move, copy and it is working fine. When I tried to implement search method, server returns:
405 method not allowed
I am using apache2 server, do I need any configuration change in the server? I got to know from the link How to get the list of folders and files deployed on Linux WebDav? that some servers will not support search method, and suggestion given from the link is to use webdav propfind method, so I want to know whether propfind with depth infinity is feasible for file system with large collections.
You can craft the PROPFIND request to limit the fields that are returned. If you were to limit this request to the searchable parameters, it could work for you.
[is] depth infinity feasible for file system with large collections
It depends, of course, on how large the collections are. You will be receiving several hundred bytes of data for each item in the collection. A collection with millions of objects could result in a pretty big response!

Can ServletFileUpload.parseRequest() only be called once per request?

I'm working a custom SpringSecurityFilter for my Grails application and I'm trying to use the commons upload library to process the request. I'm able to process the request in the filter but once it gets to my controller, none of the values are available.
Can the HttpRequest only be processed once by the upload library? I'm guessing it's cleaning up the temp files. Is there a way to keep them around so they can be processed again at the controller level?
I need to interrogate a form parameter for the security (due to the client I can't add it to the http headers) but once I get the value, it seems to wipe the request for further processing.
Yes. A Request can only be parsed once.
I saw this answer on Apache's FAQ page for FileUpload.
Question: Why is parseRequest() returning no items?
Answer: "This most commonly happens when the request has already been parsed, or processed in some other way. Since the input stream has aleady been consumed by that earlier process, it is no longer available for parsing by Commons FileUpload."
Reference: http://commons.apache.org/fileupload/faq.html

Pubsubhubbub on Rails. How to extract the raw POST body contents from the POST request?

I am having trouble setting up a pubsub enabled subscriber app using rails. I have currently subscribed to the open hub pubsubhubbub.appspot.com and am receiving pings to my application's endpoint. (as of now i have created a counter which increments everytime the end point is pinged). But i am not able to understand as to how to extract the raw POST body contents from the POST. I am new to pubsub and am eager to experiment with it. I came across this blog post but it is not language specific.
Source: Joseph Smarr: Implementing PubSubHubbub subscriber support: A step-by-step guide. http://josephsmarr.com/2010/03/01/implementing-pubsubhubbub-subscriber-support-a-step-by-step-guide/
Now you’re ready for the
pay-out–magically receiving pings from
the ether every time the blog you’ve
subscribed to has new content! You’ll
receive inbound requests to your
specified callback URL without any
additional query parameters added
(i.e. you’ll know it’s a ping and not
a verification because there won’t be
any hub.mode parameter included).
Instead, the new entries of the
subscribed feed will be included
directly in the POST body of the
request, with a request Content-Type
of application/atom+xml for ATOM feeds
and application/rss+xml for RSS
feeds. Depending on your programming
language of choice, you’ll need to
figure out how to extract the raw POST
body contents. For instance, in PHP
you would fopen the special filename
php://input to read it.
Any help would be greatly appreciated.
You didn't say but I'm assuming you are running Rails 3.x?
To get the raw POST body you simply use request.raw_post in your controller. This will give you a long string that looks like a request parameters string: some_var=something&something_else=something_else... which you can then parse to get at what you want.
However, look at you development logs for an incoming request and see if the params hash isn't a better option for you. The service should post the data under some variable name, such as some_var above, and the params hash will hold an params[:some_var] containing only that data. No need for you to dig it out on your own in other words.

Resources