how to import large files into textbox up to 50gb - textbox

right so, i want to load a 1.54gb text file from a url but it doesnt load cause its to big. i can load up to 100mb into the list box but nothing higher than that. i have got other programs that can upload 50 mb files and not crash or get a error. how could i do this with out getting any error and it taking over 15 mins to load.
My Code and the msg that comes up if there is an error
Me.Refresh()
Dim mimi As New WebClient With {
.Encoding = Encoding.UTF8
}
Try
TextBox1.Text = mimi.DownloadString("http://comboxy.com/publiclines.txt")
Catch ex As Exception
MsgBox("Error Finding Public Lines! Your internet connection may be to slow, file may be to large or lines database may be down. :(", MsgBoxStyle.Exclamation)
Label1.Text = "Host: 0"

Related

Reading consoleText URL from within a running build only returns first 10000 lines

I have a groovy code that reads from the current consoleText and do some jobs. When I run the code from the IDE, it works perfectly but when I run it as a part of a step in Jenkins, it only reads 10000 lines of the total which is approximately 2.8 million lines. The code to read from the console is:
url.withReader { bufferedReader ->
while ((line = bufferedReader.readLine()) != null) {
//do something
}
}
The url is
${BUILD_URL}/consoleText
The .../consoleText URL will not "grow" automatically -- it just provides a "snapshot" of console data that's available at query time.
So, if you GET that URL for a build while that build is still running, then you will only see part of the console log. The amount that you see will depend on the time when you issue the GET -- and possibly it will also depend on the status of some buffers.
If this used to work better in the past, then you probably moved the point in time when you tried to read the console.

Error creating vocabulary from big text file on disk

I try to perform example from https://cran.r-project.org/web/packages/text2vec/vignettes/files-multicore.html but with my file "text" - 3.7Gb plain text, build from Wikipedia XML dump with Perl script from here - http://mattmahoney.net/dc/textdata.html
setwd("c:/rtest")
library(text2vec)
library(doParallel)
N_WORKERS = 2
registerDoParallel(N_WORKERS)
it_files_par = ifiles_parallel(file_paths = "text")
it_token_par = itoken_parallel(it_files_par, preprocessor = tolower, tokenizer = word_tokenizer)
vocab = create_vocabulary(it_token_par)
This causes error:
Error in unserialize(socklist[[n]]) : error reading from connection
I have 8Gb RAM, word2vec model from this file is created without any errors.
First of all it doesn't make sense to use parallel iterators on a single file - each file processed in a separate R worker process. So here it will be worse than just itoken. Also it involves sending result from each worker to the master process. Here we see that result it too big to be send through socket.
Long story short - just use itoken or split your file into several smaller files.

Rails send_data timeout issue

To achieve export to excel, I use RubyXL to create a workbook based on queried result, and use send_data to download. The code is like:
workbook = RubyXL::Workbook.new
# Fill workbook here
send_data workbook.stream.string, filename: "myrpeort.xlsx", disposition: 'attachment'
It works well when there are not too much data, but when the data size increases, for example the saved excel file exceeds 3M, the download fails in browser with following message:
Network Error (tcp_error)
A communication error occurred: ""
The Web Server may be down, too
busy, or experiencing other problems preventing it from responding to
requests. You may wish to try again at a later time.
Seems it is not related to server timeout setting, I even changed the timeout in unicorn to 6000 (100 minutes), it still did not work...
Could you throw me some light on how to solve the issue? Thanks in advance!

Grails (iOS specific): Returning video (mp4) file gives Broken Pipe exception (getOutputStream() has already been called for this response)

I'm trying to return an mp4 file from my Grails controller so that it can be played in the browser. The following is the simplest version of what I have:
def file = new File(<path to mp4 file>)
response.outputStream << file.newInputStream()
The strange thing is that this works when hitting it from a desktop (Chrome on my MacBook), works on an Android phone, but does not work on an iPad Air.
The one header that's different in the iOS request is for "range" of "0-1", but it looks like that might not be causing a problem (tested by adding that request on my laptop).
The exception says:
ERROR errors.GrailsExceptionResolver - SocketException occurred when processing request: [GET]
and further down it says
getOutputStream() has already been called for this response.
I've found many others with similar errors, but they talk about webRequest.setRenderView(false), flushing and closing the outputstream, and many other options. I've tried all of those, but nothing seems to work.
The part that really gets me is that it works on everything except iOS.
Any thoughts would be greatly appreciated. Thanks in advance!
UPDATE 1
Per Graeme's answer below, the accept header from Chrome is:
accept -> text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,*/*;q=0.8
And iOS produces multiple requests, which have the following accept headers:
accept -> text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
accept -> */*
The second accept header, */* is the what occurs during the exception.
I have also created a JIRA issue for Grails:
http://jira.grails.org/browse/GRAILS-11325
Might be related to the Accept header that gets sent, as Grails has some parsing depending on the Accept header. If you could post an example in a JIRA with steps to reproduce that would help.
http://jira.grails.org/browse/GRAILS
This turned out to be an iOS specific issue. The range header is required to be implemented, and if you try to return the entire file content for the response of a range request, iOS will not make additional requests.
The following is the code I used:
try {
def rangeValue = request.getHeader("range")
log.debug("rangeValue: ${rangeValue}")
if (rangeValue != null) {
// Get start and end string, substring(6) removes "bytes="
def (start, end) = rangeValue.substring(6).split("-")
def startInt = start.toLong()
def endInt = end.toLong()
def fileSize = file.length()
response.reset()
response.setStatus(206)
response.setHeader("Accept-Ranges", "bytes")
// WARNING: Do not sent Content-length, as it appears to prevent videos from working in iOS
response.setHeader("Content-range", "bytes ${start}-${end}/"+Long.toString(fileSize))
response.setContentType("video/quicktime")
def bytes = new byte[endInt-startInt+1]
def inputStream = file.newInputStream()
// Skip to the point in the inputStream that the range is requesting
inputStream.skip(startInt)
// Read a chunk of the input stream into the bytes array
inputStream.read(bytes, 0, bytes.length)
response.outputStream << bytes
}
else {
response.outputStream << file.newInputStream()
}
} catch (ClientAbortException e) {
log.error("User aborted download")
}
There are a few important notes:
If the Content-length header is returned in the response, iOS will not play the video. Seems like this could be related to content being gzipped - https://stackoverflow.com/a/2359184/2601060
When using the inputStream.read() function, it will always start reading at the beginning of the stream, so make sure to skip() to the proper position in the file (the startInt)
A response can be reset() to make sure that anything that has already been written is not included (this may not be required, but prevents automatic grails actions from providing default behavior)

REXML :: RuntimeError (entity expansion has grown too large)

After upgrading to Ruby-1.9.3-p392 today, REXML throws a Runtime Error when attempting to retrieve an XML response over a certain size - everything works fine and no error is thrown when receiving under 25 XML records, but once a certain XML response length threshold is reached, I get this error:
Error occurred while parsing request parameters.
Contents:
RuntimeError (entity expansion has grown too large):
/.rvm/rubies/ruby-1.9.3-p392/lib/ruby/1.9.1/rexml/text.rb:387:in `block in unnormalize'
I realize this was changed in the most recent Ruby version:
http://www.ruby-lang.org/en/news/2013/02/22/rexml-dos-2013-02-22/
As a quick fix, I've changed the size of REXML::Document.entity_expansion_text_limit to a larger number and the error goes away.
Is there a less risky solution?
This issue is generated when you send too much content as XML response.
To fix this issue : You need to restrict the data(< 10k) in the individual node (Instead of sending the whole data, show truncated data and provide a seperate link to view full content)
The error is being raised from the below file :
ruby-2.1.2/lib/ruby/2.1.0/rexml/text.rb
# Unescapes all possible entities
def Text::unnormalize( string, doctype=nil, filter=nil, illegal=nil )
sum = 0
string.gsub( /\r\n?/, "\n" ).gsub( REFERENCE ) {
s = Text.expand($&, doctype, filter)
if sum + s.bytesize > Security.entity_expansion_text_limit
raise "entity expansion has grown too large"
else
sum += s.bytesize
end
s
}
end
The limit ruby-2.1.2/lib/ruby/2.1.0/rexml/text.rb defaults to 10240 which means 10k data per node.
REXML already defaults to only allow 10000 entity substitutions per document, so the maximum amount of text that can be generated by entity substitution will be around 98 megabytes. (Refer https://www.ruby-lang.org/en/news/2013/02/22/rexml-dos-2013-02-22/ )
That sounds like a LOT of XML. Do you really need to get all of it? Maybe you can just request certain fields from the remote server? One option might be to try another XML parser (Nokogiri for example). Another option to maybe use something other than XML as a transport (JSON? Binary?).

Resources