How to change data length parameter in maxima software? - maxima

I need to use maxima software to deal with data. I try to read data from a text file constructed as
1 2 3
11 22 33
ect.
Following comands allow for loading data sufficiently.
load(numericalio);
read_matrix("path to the file");
The problem arises when I apply them to a more realistic (larger) data set. In this case the message appears Expression longer than allowed by the configuration setting.
How to overcome this problem? I cannot see any option in configuration menu. I would be grateful for advice.

I ran into the same error message today, at it seems to be related to the size of the output that wxMaxima receives from the Maxima executable.
If you wish to display the output regardless, you can change it in the configuration here:
Edit>Configure>Worksheet>Show long expressions
Note that showing a massive expression or amount of data may dramatically slow the program down, so consider hiding the output (use a $ instead of a ; at the end of your lines) if you don't need to visualize the data.

Related

Abaqus: efficiently export large xy data set from Abaqus

I am trying to export XY data objects from sets of the size of 20-40k elements, but Abaqus is slowing down considerably, and even crashing. In fact, when I create the xy data, Abaqus gives me a warning saying that "the number of xyDataObjects is very large, and might cause performance issues". And so it does.
My usual procedure to is to create the xy data and then export in rpt format. Can someone suggest another method less prone to crashing? Would it be more efficient to divide the output element set into two or more subsets, and concatenate them after exporting?
The method recommended by #hgazibara in the comments is certainly sufficient, but it is laborious.
An easier method, I found, is a package called Abaqus2Matlab, which scrapes any variable you want from the odb. See here: http://www.abaqus2matlab.com/

Talend- Memory issues. Working with big files

Before admins start to eating me alive, I would like to say to my defense that I cannot comment in the original publications, because I do not have the power, therefore, I have to ask about this again.
I have issues running a job in talend (Open Studio for BIG DATA!). I have an archive of 3 gb. I do not consider that this is too much since I have a computer that has 32 GB in RAM.
While trying to run my job, first I got an error related to heap memory issue, then it changed for a garbage collector error, and now It doesn't even give me an error. (just do nothing and then stops)
I found this SOLUTIONS and:
a) Talend performance
#Kailash commented that parallel is only on the condition that I have to be subscribed to one of the Talend Platform solutions. My comment/question: So there is no other similar option to parallelize a job with a 3Gb archive size?
b) Talend 10 GB input and lookup out of memory error
#54l3d mentioned that its an option to split the lookup file into manageable chunks (may be 500M), then perform the join in many stages for each chunk. My comment/cry for help/question: how can I do that, I do not know how to split the look up, can someone explain this to me a little bit more graphical
c) How to push a big file data in talend?
just to mention that I also went through the "c" but I don't have any comment about it.
The job I am performing (thanks to #iMezouar) looks like this:
1) I have an inputFile MySQLInput coming from a DB in MySQL (3GB)
2) I used the tFirstRows to make it easier for the process (not working)
3) I used the tSplitRow to transform the data form many simmilar columns to only one column.
4) MySQLOutput
enter image description here
Thanks again for reading me and double thanks for answering.
From what I understand, your query returns a lot of data (3GB), and that is causing an error in your job. I suggest the following :
1. Filter data on the database side : replace tSampleRow by a WHERE clause in your tMysqlInput component in order to retrieve fewer rows in Talend.
2. MySQL jdbc driver by default retrieves all data into memory, so you need to use the stream option in tMysqlInput's advanced settings in order to stream rows.

how to find a particular redis key memory size in lua script

redis.call('select','14')
local allKeys = redis.call('keys','orgId#1:logs:email:uid#*')
for i = 1 , #allKeys ,1
do
local object11 = redis.call('DEBUG OBJECT',allKeys[i])
print("kk",object11[1])
end
Here "DEBUG OBJECT" is run successfully on redis-cli, but if we want to run through lua script on multiple key. That send error like this.
(error) ERR Error running script (call to f_b003d960240545d9540ebc2319d863221045
3815): Wrong number of args calling Redis command From Lua script
DEBUG OBJECT is not a good bet. It shows the serialized length of the value, so it is just the size of the object once stored on an RDB file.
To have some hint about the size of an object in Redis, you need to resort to more complex techniques, but you can only get an approximation. You need to run:
TYPE
OBJECT ENCODING
The object-type specific command to get its length.
Sample a few elements to understand the average string length of the object.
Based on this four informations, you need to check the Redis source code to check the different memory footprints of the internal structures used, and do the math. Not easy...
A much more viable approximation is to just to use:
APPROX_USED_MEM = num_elements * avg_size * overhead_factor
You may want to pick an overhead factor which makes sense for a variety of data types. The error is big, but is an approximation good enough for some use cases. Maybe overhead_factor may be something like 2.
TLDR: What you are trying to do is complex and leads to errors. In the future the idea is to provide a MEMORY command which is able to do this.

retrieve sequence alignment score produced by emboss in biopython

I'm trying to retrieve the alignment score of two sequences compared using emboss in biopython. The only way that I know is to retrieve it from an output text file produced by emboss. The problem is that there will be hundreds of these files to iterate over. Is there an easier/cleaner method to retrieve the alignment score, without resorting to that? This is the main part of the code that I'm using.
From Bio.Emboss.Applications import StretcherCommandline
needle_cline = StretcherCommandline(asequence=,bsequence=,gapopen=,gapextend=,outfile=)
stdout, stderr = needle_cline()
I had the same problem and after some time spent on searching for a neat solution I popped up a white flag.
However, to speed up significantly the processing of output files I did the following things:
1) I used re python module for handling regular expressions to extract all data needed.
2) I created a ramdisk space for the output files. The use of a ramdisk here allowed for processing and exchanging all the data in RAM memory (much faster than writing and reading the output files from a hard drive, not to mention it saves your hdd in case of processing massive number of alignments).
I don't know if there is one specifically for your command.
For Primer3CommandLine, there is Primer3. Make your life much easier with something like:
from Bio.Emboss import Primer3
inputFile = "./wherever/your/outputfileis.out"
with open(inputFile) as fileHandle:
record = Primer3.parse(fileHandle)
# XXX check is len>0
primers = record.next().primers
numPrimers = len(primers)
# you should have access to each primer, using a for loop
# to check how to access the data you care about. For example:
I would also check http://biopython.org/wiki/SeqIO#Sequence_Input

Comparing using Map Reduce(Cloudera Hadoop 0.20.2) two text files of size of almost 3GB

I'm trying to do the following in hadoop map/reduce( written in java, linux kernel OS)
Text files 'rules-1' and 'rules-2' (total 3GB in size) contains some rules, each rule are separated by endline character, so the files can be read using readLine() function.
These files 'rules-1' and 'rules-2' needs to be imported as a whole from hdfs in every map function in my cluster i.e. these file are not splittable across different map function.
Input to the mapper's map function is a text file called 'record' (each line is terminated by endline character), so from the 'record' file we get the (key, value) pair. The file is splittable and can be given as input to different map function used in the whole map/reduce process.
What needs to be done is compare each value(i.e. lines from record file) with the rules inside 'rules-1' and 'rules-2'
Problem is, if I pull out each line of rules-1 and rules-2 files to a static arraylist only once, so that each mapper can share the same arraylint and try to compare elements in the arraylist with the each input value from the record file, I get a memory overflow error, since 3GB cannot be stored at a time in the arraylist.
Alternatively, if I import only few lines from the rules-1 and rules-2 files at a time and compare them to each value, map/reduce is taking a lot time to finish its job.
Could you guys provide me any other alternative ideas how can this be done without the memory overflow error? Will it help if I put those file-1 and file-2 inside a hdfs supporting database or something? I'm going out of ideas actually.Would really appreciate if some of you guys could provide me your valuable suggestions.
Iif you input files are small - you can load them into static variables and use rules as an input.
If above is not a case I can suggest the following ways:
a) To give rule-1 and rule-2 high replication factor close to the number of nodes you have. Then you can read from HDFS rule=1 and rule-2 for each record in the input relatively efficient - because it will be sequential read from the local datanode.
b) If you can consider some hash function which, when applied to the rule and to the input string will predict without false negatives that they can match - then you can emit this hash for rules, input record and resolve all possible matches in the reducer. It will be very similar to the way how a join is done using MR
c) I would consider some other optimization techniques like building search trees, or sorting since otherwise the problem looks computationally expensive and will took forever...
On this page find Real-World Cluster Configurations
it will cover file size configuration
You could use the param "mapred.child.java.opts" in conf/mapred-site.xml to increase the memory for your mappers. You might not be able to run as many map slots per server but with more servers in your cluster you could still parallelize your job.
Read the content text file from the MapReduce function and read the keyword text file from the mapper function (for reading your HDFS) and split using StringTokenizer value.toString reading from MapReduce and in your mapper function write HDFS read text file code it will read line-by-line so use two while loops here you compare. Whenever you want data send it to reducer.
Split the 3gb text file into several text files and apply that all text files as usual MapReduce your previous program.
For splitting text file I written Java program and you decide how many lines you want write in each text file.

Resources