h5py reading raw data with escape characters - hdf5

Hi I want to read the hdf5 file data as it is written
But when I read it with the following code I get the following output
COde
hf = h5py.File('Json.h5', 'r')
data_read = hf.get("BinaryData_metadata")
rmdwrite = open("Test.json", "w")
rmdwrite.write(str(np.array(data_read)))
rmdwrite.close()
hf.close()
Output
[b'{\n\t"TestReport": {\n\t\t"TestName": "XYZ",\n\t\t"Description"................
How to get the exact output with the same formatting in my output file?
When I print with this
Data_arr = str(np.array(data_read))
Data_arr = repr(Data_arr)
I get
'[b\'{\\n\\t"TestReport": {\\n\\t\\t"Te................
OKey this is how I am writing the data via C++
DataSpace dataspace(1, dimsf); //Creating Dataspace
StrType datatype(PredType::C_S1); //Creating Datatype of type char
datatype.setOrder(order); //Data Store Order
datatype.setSize(file_datastring.length()); //Datalength
datatype.setCset(H5T_CSET_UTF8);
DataSet dataset = Hdf5::fileObject.createDataSet(WriteDataSet, datatype, dataspace); //Create dataset
dataset.write(file_datastring, datatype); //Write to dataset
is there something here which is appending that extra \

The solution which I found was
hf = h5py.File(H5FileName, 'r')
FileObj = open(OutFileName, "w")
hf.get(H5DataSetName).value.tofile(FileObj)
FileObj.close()
hf.close()
This works perfectly
Regards.
Siddharth

Related

Biopython Genbank.Record : trying to understand source code

I am writing a csv reader to generate Genbank files to capture annotations with sequence.
First I used a Bio.SeqRecord and got correctly formatted output but the SeqRecord class lacks fields that I need.
Blockquote
FEATURES Location/Qualifiers
HCDR1 27..35
HCDR2 50..66
HCDR3 99..109
I switched to Bio.GenBank.Record and have the needed fields except now the annotation formatting is wrong. It can't have the extra "type:" "location:" and "qualifiers:" text and the information should all be on one line.
Blockquote
FEATURES Location/Qualifiers
type: HCDR1
location: [26:35]
qualifiers:
type: HCDR2
location: [49:66]
qualifiers:
type: HCDR3
location: [98:109]
qualifiers:
The code for pulling annotations is the same for both versions. Only the class changed.
# Read csv entries and create a container with the data
container = Record()
container.locus = row['Sample']
container.size = len(row['Seq'])
container.residue_type="PROTEIN"
container.data_file_division="PRI"
container.date = (datetime.date.today().strftime("%d-%b-%Y")) # today's date
container.definition = row['FullCloneName']
container.accession = [row['Vgene'],row['HCDR3']]
container.version = getpass.getuser()
container.keywords = [row['ProjectName']]
container.source = "test"
container.organism = "Homo Sapiens"
container.sequence = row['Seq']
annotations = []
CDRS = ["HCDR1", "HCDR2", "HCDR3"]
for CDR in CDRS:
start = row['Seq'].find(row[CDR])
end = start + len(row[CDR])
feature = SeqFeature(FeatureLocation(start=start, end=end), type=CDR)
container.features.append(feature)
I have looked at the source code for Bio.Genbank.Record but can't figure out why the SeqFeature class has different formatting output compared to Bio.SeqRecord.
Is there an elegant fix or do I write a separate tool to reformat the annotations in the Genbank file?
After reading the source code again, I discovered Bio.Genbank.Record has its own Features method that takes key and location as strings. These are formatted correctly in the output Genbank file.
CDRS = ["HCDR1", "HCDR2", "HCDR3"]
for CDR in CDRS:
start = row['Seq'].find(row[CDR])
end = start + len(row[CDR])
feature = Feature()
feature.key = "{}".format(CDR)
feature.location = "{}..{}".format(start, end)
container.features.append(feature)

Write Twitter Frequency analysis to a CSV using python

How do I write the output of my code to a csv?
Here is what I'm trying, the frequency analysis works, but I can't get the csv to write. Pretty new to python, so I am sure that I am doing something wrong.
# This Python file uses the following encoding: utf-8
import os, sys
import re
import csv
filename = 'TweetsCSV_ORIGINAL.txt'
word_list = re.split('\s+', file(filename).read().lower())
print 'Words in text:', len(word_list)
freq_dic = {}
punctuation = re.compile(r'[.?!,":;]')
for word in word_list:
word = punctuation.sub("", word)
try:
freq_dic[word] += 1
except:
freq_dic[word] = 1
print 'Unique words:', len(freq_dic)
freq_list = freq_dic.items()
freq_list.sort()
for word, freq in freq_list:
print word, freq
#write to CSV
res = [word, freq]
csvfile = "tweetfreq.csv"
#Assuming res is a flat list
with open(csvfile, "w") as output:
writer = csv.writer(output, lineterminator='\n')
for val in res:
writer.writerow([val])
This snippet will append a line to the end of your CSV file.
with open('tweetfreq.csv', 'a') as csvfile:
csv_writer = csv.writer(csvfile)
csv_writer.writerow([word,freq])

How can I efficiently parse formatted text from a file in Qt?

I would like to get efficient way of working with Strings in Qt. Since I am new in Qt environment.
So What I am doing:
I am loading a text file, and getting each lines.
Each line has text with comma separated.
Line schema:
Fname{limit:list:option}, Lname{limit:list:option} ... etc.
Example:
John{0:0:0}, Lname{0:0:0}
Notes:limit can be 1 or 0 and the same as others.
So I would like to get Fname and get limit,list,option values from {}.
I am thinking to write a code with find { and takes what is inside, by reading symbol by symbol.
What is the efficient way to parse that?
Thanks.
The following snippet will give you Fname and limit,list,option from the first set of brackets. It could be easily updated if you are interested in the Lname set as well.
QFile file("input.txt");
if (!file.open(QIODevice::ReadOnly | QIODevice::Text))
qDebug() << "Failed to open input file.";
QRegularExpression re("(?<name>\\w+)\\{(?<limit>[0-1]):(?<list>[0-1]):(?<option>[0-1])}");
while (!file.atEnd())
{
QString line = file.readLine();
QRegularExpressionMatch match = re.match(line);
QString name = match.captured("name");
int limit = match.captured("limit").toInt();
int list = match.captured("list").toInt();
int option = match.captured("option").toInt();
// Do something with values ...
}

Getting classification result from mahout

Finally I am able to train mahout classifier , now my problem is how can i get target category for my input document.
What is the process of getting target category for my text documents ?
First, you have to vectorize the text document, RandomAccessSparseVector.
Some sample code for your reference:
Vector vector = new RandomAccessSparseVector(FEATURES);
FeatureExtractor fe = new FeatureExtractor();
HashSet<String> fs = fe.extract(text);
for (String s : fs) {
int index = dictionary.get(s);
vector.setQuick(index, frequency.get(index));
}
Then, use the Classifier.classify(Vector) to get the result.

Decompressing LZW in Lua [duplicate]

Here is the Pseudocode for Lempel-Ziv-Welch Compression.
pattern = get input character
while ( not end-of-file ) {
K = get input character
if ( <<pattern, K>> is NOT in
the string table ){
output the code for pattern
add <<pattern, K>> to the string table
pattern = K
}
else { pattern = <<pattern, K>> }
}
output the code for pattern
output EOF_CODE
I am trying to code this in Lua, but it is not really working. Here is the code I modeled after an LZW function in Python, but I am getting an "attempt to call a string value" error on line 8.
function compress(uncompressed)
local dict_size = 256
local dictionary = {}
w = ""
result = {}
for c in uncompressed do
-- while c is in the function compress
local wc = w + c
if dictionary[wc] == true then
w = wc
else
dictionary[w] = ""
-- Add wc to the dictionary.
dictionary[wc] = dict_size
dict_size = dict_size + 1
w = c
end
-- Output the code for w.
if w then
dictionary[w] = ""
end
end
return dictionary
end
compressed = compress('TOBEORNOTTOBEORTOBEORNOT')
print (compressed)
I would really like some help either getting my code to run, or helping me code the LZW compression in Lua. Thank you so much!
Assuming uncompressed is a string, you'll need to use something like this to iterate over it:
for i = 1, #uncompressed do
local c = string.sub(uncompressed, i, i)
-- etc
end
There's another issue on line 10; .. is used for string concatenation in Lua, so this line should be local wc = w .. c.
You may also want to read this with regard to the performance of string concatenation. Long story short, it's often more efficient to keep each element in a table and return it with table.concat().
You should also take a look here to download the source for a high-performance LZW compression algorithm in Lua...

Resources