Powershell 7 Byte encoding an image file - character-encoding

I'm using PowerShell to upload files to a web site through an API.
In PS5.1, this would get the image in the correct B64 encoding to be processed by the API at the other end:
$b64 = [convert]::ToBase64String((get-content $image_path -encoding byte))
In PS7, this breaks with the error:
Get-Content: Cannot process argument transformation on parameter 'Encoding'. 'byte' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method. (Parameter 'name')
I've tried reading the content in other encoding then using [system.Text.Encoding]:GetBytes() to convert, but the byte array is always different. Eg
PS 5.1> $bytes = get-content -Path $image -Encoding byte ; Write-Host "bytes:" $bytes.count ; Write-Host "First 11:"; $bytes[0..10]
bytes: 31229
First 11:
137
80
78
71
13
10
26
10
0
0
0
But on PowerShell7:
PS7> $enc = [system.Text.Encoding]::ASCII
PS7> $bytes = $enc.GetBytes( (get-content -Path $image -Encoding ascii | Out-String)) ; Write-Host "bytes:" $bytes.count ; Write-Host "First 11:"; $bytes[0..10]
bytes: 31416 << larger
First 11:
63 << diff
80 << same
78 <<
71
13
10
26
13 << new
10
0
0
I've tried other combinations of encodings without any improvement.
Can anyone suggest where I'm going wrong?

With PowerShell 6 Byte is not a valid argument for the Enconding-Parameter anymore. You should try the AsByteStream-Parameter in combination with the Parameter Raw like so:
$b64 = [convert]::ToBase64String((get-content $image_path -AsByteStream -Raw))
There is even an example in the help for Get-Content that explains how to use these new parameters.

The problem turned out to be with Get-Content. I bypassed the problem using:
$bytes = [System.IO.File]::ReadAllBytes($image_path)
NOTE: the $image_path needs to be absolute, not relative.
So my Base64 line became:
$b64 = [convert]::ToBase64String([System.IO.File]::ReadAllBytes($image_path))

Related

How to split paired-end fastq files?

I have Illumina paired-end reads contained within one .fastq file, denoted as '/1' for forward reads and '/2' for reverse reads.
I am using grep to pull out the individual reads and place them into 2 respective files (one for forward reads and one for reverse.
grep -A 3 "/1$" sample21_pe.unmapped.fq > sample21_1_rfa.fq
grep -A 3 "/2$" sample21_pe.unmapped.fq > sample21_2_rfa.fq
However, when I try to use the files (fastqc, assembly, etc), they do not work. When running
fastqc i get the following error:
Failed to process file sample21_1_rfa.fq
uk.ac.babraham.FastQC.Sequence.SequenceFormatException: ID line didn't start with '#'
at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:134)
at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:105)
at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
at java.lang.Thread.run(Thread.java:662)
But, if you look at the files they identifier does indeed start with an '#'. Any advice on why these files aren't working? I had originally converted .bam files into the .fastq files with
samtools bam2fq
Here are samples of each individual file:
merged .fastq
#HISEQ:534:CB14TANXX:4:1101:1091:2161/1
GAGAAGCTCGTCCGGCTGGAGAATGTTGCGCTTGCGGTCCGGAGAGGACAGAAATTCGTTGATGTTAACGGTGCGCTCGCCGCGGACGCTCTTGATGGTGACGTCGGCGTTGAGCGTGACGCACG
+
B/</<//B<BFF<FFFFFF/BFFFFFFB<BFFF<B/7FFF7B/B/FF/F/<<F/FFBFFFBBFFFBFB/FF<BBB<B/B//BBFFFFFFF/B/FF/B77B//B7B7F/7F###############
#HISEQ:534:CB14TANXX:4:1101:1091:2161/2
TGACGCCTGCCGTCAGGTAGGTTCTCCGCAGATCCGAAATCTCGCGACGCTCGGCGGCAACATCTGCCAGTCGTCCGTGGCGGGCGACGGTCTCGCGGCGTGCGTCACGCTCAACGCCGACGTAC
+
/B<B//F/F//B<///<FB/</F<<FFFFF<FFBF/FF<//FB/F//F7FBFFFF/B</7<F//<BB7/7BB7/B<F7BF<BFFFB7B#####################################
#HISEQ:534:CB14TANXX:4:1101:1637:2053/1
NGTTTACCATACAACAATCTTGCGACCTATTCAAATCATCTATATGCCTTATCAAGTTTTCATAGCTTTCAAGATTCTCAATTTCCTCACGTCTCGCTTTGCTCAACCTACAAAAACTCTCTTCT
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFB/BFBBFBB<<<<FFFFFFBB<FBFFBFF
#HISEQ:534:CB14TANXX:4:1101:1637:2053/2
TCGGTCGTTGGGAAAAGACCTGTGGTAAACATCCTACGCAAAAGCCATTGCGGTTACTCGTTCGTATGATTCTTGCATCAACTAATCAAGGCGATTGGGTTCTCGACCCATTTTGTGGAAGTTCG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFBFFFFFB<FF<<BBFB
#HISEQ:534:CB14TANXX:4:1101:1792:2218/1
TCTATCGGCTGACCGATAAGCTGTCGCCTGCCGACCGTCCTGCCATGGGACGGCGCATCGCACAGCTCACCCTGGACTAACTCTCCAACACCATGATGCTGACACGCTCGGCAAAAACACCCGAT
+
<<B/<B</FF/<B/<//F<//FF<<<FF//</7/F<</FFF####################################################################################
#HISEQ:534:CB14TANXX:4:1101:1792:2218/2
TGCCGGAGGGCGTCGATGGTGGCATCGAGCTTTTTTGCCGAGCGTGTCAGCATGATGGTGTTGTAGAGATAGTCCATGGTGAGCTGTGCGATGCGCCGTTCCATGGCAGGACGGTCGGCAGGCGC
+
BBBBBFFFFFFFFBFFFBBFFFFFFFFFFFBBFFFF/FF<F7FF//F/FBB/FFBFFF/F7BFF<F/FFFFFFFFB/7BB<7BFFFFFFFFFFFFF<B///B/7B/7/B//77BB//7B/B7/B#
#HISEQ:534:CB14TANXX:4:1101:1903:2238/1
TATTCCAGCGACCGTTATAATCAAACTCAACTACATAGTCATTGCGGATTGCTTCAAGAAATTTTTTCCAGACTATTTCATCAATATTTATTTTGGGAACTGGTGCAACAGCAATTCTTTTTAAA
+
BBBBBFFFFFFFFFFFFFBFF/FFBFFBFFFFFFFF/FFFFFF<<FFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFBF/B/<B<B/FBF7/<FFFFFFF/BB/7///7FF<BFFF//B/FFF###
#HISEQ:534:CB14TANXX:4:1101:1903:2238/2
TAAGGTTGGAGAAGCAACAATTTACCGTGATATTGATTTGCTCCGAACATATTTTCATGCGCCACTCGAGTTTGACAGGGAGAAAGGCGGGTATTATTATTTTAATGAAAAATGGGATTTTGCCC
+
B<BBBFFFFFFF<FFFFFFFFFFFFFFFFFF/BFFFFFFF<<FF<F<FFF/FF/FFFFBFB</<//<B/////<<FFFFB/<F<BFF/7/</7/7FB/B/BFF<//7BFF###############
#HISEQ:534:CB14TANXX:4:1101:2107:2125/1
TGTAGTATTTATTACATCATATAGAATGTAGATATAAAAGATGAAAAAGCTATAATTTCTTTGATAATATAAGGAGGGAATAACACTATGAGGATTGATAGAGCAGGAATCGAGGATTTGCCGAT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBFFFFFFFBB<FBB7BFF#
#HISEQ:534:CB14TANXX:4:1101:2107:2125/2
TACCACTATCGGCAAATCCTCGATTCCTGCTCTATCAATCCTCATAGTGTTATTCCCTCCTTATATTATCAAAGAAATTATAGCTTTTTCATCTTTTATATCTACATTCTATATGATGTAATAAA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFBFBFFFFFFFBBFFFFFFFBF7F/B/BBF7/</FF/77F/77BB#
#HISEQ:534:CB14TANXX:4:1101:2023:2224/1
TCACCAGCTCGGCACGCTTGTCCTTGACCTCCTGCTCGATCTGACCGTCCATCTTGGCTGCCACGGTGTTCTCCTCGGCGGAGTAGGCAAAGCAGCCCAGACGGTCGAACTGTATCTCCTTGACA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFB<<B7BBFBFFF<FFBBFFFBF/7B/<B<
#HISEQ:534:CB14TANXX:4:1101:2023:2224/2
TCGAGGATCTGTGCAACTTTGTCAAGGAGATACAGTTCGACCGTCTGGGCTGCTTTGCCTACTCCGCCGAGGAGAACACCGTGGCAGCCAAGATGGACGGTCAGATCGAGCAGGAGGTCAAGGAC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFBFBFFFFFFFFFFFFFFFFFFFFFFFFFBBFFFFFFFFFFFFF<7BF/<<BB###
#HISEQ:534:CB14TANXX:4:1101:2038:2235/1
TTTATGCGAATGTAGAGTGGCTTCTCCACTGCCTCGGTGAAGCCCACGCGCGAGATGAGCGAATTAAGCTGCTTTGCAGTGAATTGCATTGCATATACACCTGCGTCGGCTTGAATACTTGTGCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFF//BFFFFFFFFFFFFF<B<BB###
#HISEQ:534:CB14TANXX:4:1101:2038:2235/2
AATCCGCTCGTGAAAGCTCCCGATAACGCCACAGTGAACACCGTGGAGTTCTCTGATACCGAAGATTTCGCACGCAGCACAAGTATTCAAGCCGACGCAGGTGTATATGCAATGCAATTCACTGC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBBFFFFFFFFFFFFFFFFFFFFFFF
#HISEQ:534:CB14TANXX:4:1101:2271:2041/1
NACACTTGTCGATGATCTTGCCAAGCTGCTTCTTGCCCACCAGGAAGCCGATCTCCAGATCAAACTCGTGGCCGGGAACACTCCGGTCCACAAAGCCCAGGTCCTGGGGAATGGGCTCATCGTAG
+
#<</BB/F/BB/F<FFFFFFFFF/<BFFFFFFFF<<FFBFFFFFFBFBFBBB<<FFFFBFFF/<B/FFFFFFFFFFFFFFFFF<FB<<BFF77BFFF/<BFFFB<</BB</7BFFFB########
#HISEQ:534:CB14TANXX:4:1101:2271:2041/2
GACTCATCTACAATGAGCCCATTCCCCAGGACCTGGGCTTTGTGGACCGGAGTGTTCCCGGCCACGAGTTTGATCTGGAGATCGGCTTCCTGGTGGGCAAGAAGCAGCTTGGCAAGATCATCGCC
+
<<BBBFFF<F/BFFFBFBF<BFF<<F/FFFBFFFF<<FFFFBFFFFFFBFFF/<B<F/<</<FFF//FFFFF/<<F/B/B/7/FF<<FF/7B/BBB/7///7////<B/B/BB/B/B/B/7BB##
Example of forward reads after being pulled out and placed into their own .fastq file:
#HISEQ:534:CB14TANXX:4:1101:1091:2161/1
GAGAAGCTCGTCCGGCTGGAGAATGTTGCGCTTGCGGTCCGGAGAGGACAGAAATTCGTTGATGTTAACGGTGCGCTCGCCGCGGACGCTCTTGATGGTGACGTCGGCGTTGAGCGTGACGCACG
+
B/</<//B<BFF<FFFFFF/BFFFFFFB<BFFF<B/7FFF7B/B/FF/F/<<F/FFBFFFBBFFFBFB/FF<BBB<B/B//BBFFFFFFF/B/FF/B77B//B7B7F/7F###############
--
#HISEQ:534:CB14TANXX:4:1101:1637:2053/1
NGTTTACCATACAACAATCTTGCGACCTATTCAAATCATCTATATGCCTTATCAAGTTTTCATAGCTTTCAAGATTCTCAATTTCCTCACGTCTCGCTTTGCTCAACCTACAAAAACTCTCTTCT
+
#<<BBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFFFFFFFFFFFB/BFBBFBB<<<<FFFFFFBB<FBFFBFF
--
#HISEQ:534:CB14TANXX:4:1101:1792:2218/1
TCTATCGGCTGACCGATAAGCTGTCGCCTGCCGACCGTCCTGCCATGGGACGGCGCATCGCACAGCTCACCCTGGACTAACTCTCCAACACCATGATGCTGACACGCTCGGCAAAAACACCCGAT
+
<<B/<B</FF/<B/<//F<//FF<<<FF//</7/F<</FFF####################################################################################
--
#HISEQ:534:CB14TANXX:4:1101:1903:2238/1
TATTCCAGCGACCGTTATAATCAAACTCAACTACATAGTCATTGCGGATTGCTTCAAGAAATTTTTTCCAGACTATTTCATCAATATTTATTTTGGGAACTGGTGCAACAGCAATTCTTTTTAAA
+
BBBBBFFFFFFFFFFFFFBFF/FFBFFBFFFFFFFF/FFFFFF<<FFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFBF/B/<B<B/FBF7/<FFFFFFF/BB/7///7FF<BFFF//B/FFF###
--
#HISEQ:534:CB14TANXX:4:1101:2107:2125/1
TGTAGTATTTATTACATCATATAGAATGTAGATATAAAAGATGAAAAAGCTATAATTTCTTTGATAATATAAGGAGGGAATAACACTATGAGGATTGATAGAGCAGGAATCGAGGATTTGCCGAT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFF/FFFFFFFFFFFFFFFFFFFFFFFFFFFFBBBFFFFFFFBB<FBB7BFF#
--
#HISEQ:534:CB14TANXX:4:1101:2023:2224/1
TCACCAGCTCGGCACGCTTGTCCTTGACCTCCTGCTCGATCTGACCGTCCATCTTGGCTGCCACGGTGTTCTCCTCGGCGGAGTAGGCAAAGCAGCCCAGACGGTCGAACTGTATCTCCTTGACA
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFFFFFFFFB<<B7BBFBFFF<FFBBFFFBF/7B/<B<
--
#HISEQ:534:CB14TANXX:4:1101:2038:2235/1
TTTATGCGAATGTAGAGTGGCTTCTCCACTGCCTCGGTGAAGCCCACGCGCGAGATGAGCGAATTAAGCTGCTTTGCAGTGAATTGCATTGCATATACACCTGCGTCGGCTTGAATACTTGTGCT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFF//BFFFFFFFFFFFFF<B<BB###
--
#HISEQ:534:CB14TANXX:4:1101:2271:2041/1
NACACTTGTCGATGATCTTGCCAAGCTGCTTCTTGCCCACCAGGAAGCCGATCTCCAGATCAAACTCGTGGCCGGGAACACTCCGGTCCACAAAGCCCAGGTCCTGGGGAATGGGCTCATCGTAG
+
#<</BB/F/BB/F<FFFFFFFFF/<BFFFFFFFF<<FFBFFFFFFBFBFBBB<<FFFFBFFF/<B/FFFFFFFFFFFFFFFFF<FB<<BFF77BFFF/<BFFFB<</BB</7BFFFB########
--
#HISEQ:534:CB14TANXX:4:1101:2678:2145/1
CTGTACATAGTACGTATTTGACGCCTGCGTCGATGTAGCGTTTGAGGAAGGGAAGCAGCGGTTCTGCAGAGTCCTCTTTCCATCCGTTGATGCTAATCATTCCGTTGCGTACATCCGCTCCGAGA
+
BBBBBFFFFFFF<FFF<FFFFFFFFBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<BFFF7BFFFFFFFF<BBFFFFFFFFBBFBBB<FFBFFFFFFFFFFFFB<BFFFFFFBFB/BFFF####
--
#HISEQ:534:CB14TANXX:4:1101:2972:2114/1
CTCTGTGCCGATCCCTTTGCCTTTGCGTTTTGAGGAAAGGAAACCACCTTCTGGGTCGGTGAGGATAGTTCCGGTGAAGGTGTTGTCCACCGCCAGGCATAGGGAATAGCTGTCAGCCTTTGCTC
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFB/FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFBFFFF<FFFFFFFFFF<BFFFFF
--
#HISEQ:534:CB14TANXX:4:1101:2940:2222/1
CTAATTTTTTCATTATATTACTAATTTTGTAATTGGTAAAATATTATAATATCCTTGTACATTAAGACCCCAATAATCAGAAGAAGTAAAATTAATTCCTGCAACAGTTCTTAAATATCCATTAG
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFF<FBFFFFFFFFFFFFFFF/FBFBFFBFFFFF/<F<FFFFFFFFFF<FFFFFFBFFFFFFFFF</FBFBBF<F/7//FFBFBBFFF/<7BF#
--
#HISEQ:534:CB14TANXX:4:1101:3037:2180/1
CGTCAGTTCCGCAACGATAAAGAGTTCCGCATTGCAGTCACCTGTACGCTGGTAGCCACCGGAACCGATGTCAAGCCGTTGGAGGTGGTGATGTTCATGCGCGACGTAGCTTCCGAGCCGTTATA
+
B/BBBBBFFFFFFF<FFBFFFFFF<FFFFBFFFFFFF<BBFFFFFFFFFFFFFFFFFBFF/FFFFBFFBFFFFBFF/7F/BFB/BBFFFFFFFFBFF<BBF<7BBFFFFFFBBFFF/B#######
--
#HISEQ:534:CB14TANXX:4:1101:3334:2171/1
ACCGATGTACATACCCGGACGGGTACGCACATGCTCCATATCGCTCAAGTGGCGGATATTGTCATCTGTATATTCTACAGGTTGCTCCTGAGGGGTATTTGCCAGTTCTTCGGCAGCACCCTTTT
+
BBBBBFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFFFFFFBFFFFFFFFFFFBFFFFFFFFFF</<BFFFFFFFFBBFFFFFFBF</BB///BF<FFFFF<</<B
--
#HISEQ:534:CB14TANXX:4:1101:3452:2185/1
CGCAGACGGATTTGCTTGAAGTCCGTCTCATCGTATTCCGACAACTCATCGAGGAACACACGCTTGTATTGACTGATACCCTTGATTTTCTCCGGGTCGTCAAGACCACTGAAATCAATCTTGCC
+
BBBBBFFFF<FFFFFFFFFFFFFFFFFFFFFFFFFFF<FFFFFBFFFFFFFFFFFFFFFFFFFFBFFFFFFFFFFBFFFFFBFFBFFFFFFFFFB/77B/FBBFFF/<FFF/77BBFFFBFFBBB
--
Any advice would be appreciated. Thanks!
In general, this operation is called deinterlace fastq or deinterleave fastq. The question already has the answer here:
deinterleave fastq file
https://www.biostars.org/p/141256/
I am copying it here, with minor reformatting for clarity:
paste - - - - - - - - < interleaved.fq \
| tee >(cut -f 1-4 | tr "\t" "\n" > read1.fq) \
| cut -f 5-8 | tr "\t" "\n" > read2.fq
This command converts the interlaced fastq file into 8-column tsv file, cuts columns 1-4 (read 1 lines), changes from tsv to fastq format (by replacing tabs with newlines) and redirects the output to read1.fq. In the same STDOUT stream (for speed), using tee, it cuts columns 5-8 (read 2 lines), etc, and redirects the output to read2.fq.
You can also use these command line tools:
iamdelf/deinterlace: Deinterlaces paired-end FASTQ files into first and second strand files.
https://github.com/iamdelf/deinterlace
deinterleave FASTQ files
https://gist.github.com/nathanhaigh/3521724
Or online tools with Galaxy web UI, for example this tool: "FASTQ splitter on joined paired end reads", installed on several public Galaxy instances, such as https://usegalaxy.org/ .
Avoid using a regex for simple fastq file parsing if you can use line numbers, both for speed (pattern matching is slower than simple counting) and for robustness.
Highly unlikely, but a pattern like ^#.*/1$ (or whatever the readers might change it to, while reusing this code later) can match also the base quality line. A good general rule is to simply rely on fastq spec, which says 4 lines per record.
Note that #, /, 1, and 2 characters are allowed in Illumina Phred scores: https://support.illumina.com/help/BaseSpace_OLH_009008/Content/Source/Informatics/BS/QualityScoreEncoding_swBS.htm .
A one-liner that pulls out such (admittedly, very rare) reads is left as an exercise to the reader.
The fastq format uses 4 lines per read.
Your snippet has 5, as there are -- lines. That could cause confusion to softwares expecting a 4 line format.
You can add --no-group-separator to the grep call to avoid adding that separator.
I usually follow these steps to convert bam to fastq.gz
samtools bam2fq myBamfile.bam > myBamfile.fastq
cat myBamfile.fastq | grep '^#.*/1$' -A 3 --no-group-separator > sample_1.fastq
cat myBamfile.fastq | grep '^#.*/2$' -A 3 --no-group-separator > sample_2.fastq
gzip sample_1.fastq
gzip sample_1.fastq
Once you have the two files, you should order them to be sure that the reads are really paired.
We can split FASTQ files using Seqkit.
seqkit split2 -p 2 sample21_pe.unmapped.fq
https://bioinf.shenwei.me/seqkit/usage/#split2
Example 4 will help this question.
I'm not sure if it recognize the read ID. It split and write alternately into 1st-output-file and 2nd-output-file.

Parsing text from .txt files

I've a tabbed log file but I need only few chracters of the line marked 30.10 in the beginning.
Using the command
awk '/^30.10/{print}' FOOD_ORDERS_201907041307.DEL
i get this output
30.1006 35470015000205910002019070420190705 00000014870000000034
30.1006 35470015000205900002019070420190705 00000014890000000029
30.1006 35470023000205920002019070420190705 00000014900000000011
What i need to extract is 3547 and the last nth caracthers from the very end after zeros.
So, expected output will be:
3547
34
29
11
But if the last 10 caracthers contains leading zeros and a number, i need that number
While your question is unclear, your answer to Ed Morton's comment provides a bit more clarity on what you are trying to achieve. Where it is still unclear is just exactly you want from the third field. From your question and the various comments, it appears if the line begins with 30.10 you want the first 4-digits from second field and you want the rightmost digits that are [1-9] from the third field.
If that accurately captures what you need, then awk with a combination of substr, match and length string functions can isolate the digits you are interested in. For example:
awk '/^30.10/ {
l=match ($3, /[1-9]+$/)
print substr ($2, 1, 4) " " substr ($3, l, length($3)-l+1)
}' test
Would take the input file (borrowed from Dudi Boy's answer), e.g.
$ cat test
30.1006 35470015000205910002019070420190705 00000014870000000034
30.1006 35470015000205900002019070420190705 00000014890000001143
30.1006 35470015000205900002019070420190705 00000014890000000029
30.1006 35470023000205920002019070420190705 00000014900000000011
and return to you:
3547 34
3547 1143
3547 29
3547 11
Let me know if that accurately captures what you need.
Here is a simple awk script to do the task:
script.awk
/^30.10/ { # for each line starting with 30.10
last2chars = substr($3, length($3)-1); # extract last 2 chars from 3rd field into variable last2chars
if($3 ~ /00001143$/) last2chars = 1143; # if 3rd field ends with 1143, update variable last2chars respectively
print last2chars; # output variable last2chars
}
input.txt
30.1006 35470015000205910002019070420190705 00000014870000000034
30.1006 35470015000205900002019070420190705 00000014890000001143
30.1006 35470015000205900002019070420190705 00000014890000000029
30.1006 35470023000205920002019070420190705 00000014900000000011
running:
awk -f script.awk input.txt
outupt:
34
1143
29
11
GOT Part of it!
awk '/^30.10/{print}' FOOD_ORDERS_201907041307.DEL | sed 's/.*(..)/\1/'

epson send data tools HOW to

i want to send binary data (ESC/POS command) via EPSON Send Data Tools (senddat.exe)
according to there website / Manual from command prompt
If the printer is set as a USB printer class:
senddat.exe scriptfile USBPRN
(C:\senddat.exe sample.txt ESDPRT001)
file:sample.txt
' Sample script of senddat
' Version 0.01
'Comment line is starting ' character
!Display line is starting ! character
.Pause line is starting . character
'Decimal data
48 49 50 51 CR LF
'Hexadecimal data
30h 31h 32h CR LF
0x33 0x34 0x35 CR LF
$36 $37 $38 CR LF
'String data 1
string1 CR LF
'String data 2
"string2" CR LF
'Special characters
"\"" CR LF
"\'" CR LF
"\\" CR LF
"\0" CR LF
which should be printing.
0123
012
345
678
String1
String2
“
‘
BUT it does not print any thing only creating out put in file (file name is same as PORT name in same directory) like my above command is making file c:\ESDPRT001
can any body help me in this.
To out put the data to USB Printer Class printer, you need to set like below
senddat.exe sample.txt USBPRN0
"USBPRN0" is just example, you need to set correct # with your test PC environment

How can I extract some data out of the middle of a noisy file using Perl 6?

I would like to do this using idiomatic Perl 6.
I found a wonderful contiguous chunk of data buried in a noisy output file.
I would like to simply print out the header line starting with Cluster Unique and all of the lines following it, up to, but not including, the first occurrence of an empty line. Here's what the file looks like:
</path/to/projects/projectname/ParameterSweep/1000.1.7.dir> was used as the working directory.
....
Cluster Unique Sequences Reads RPM
1 31 3539 3539
2 25 2797 2797
3 17 1679 1679
4 21 1636 1636
5 14 1568 1568
6 13 1548 1548
7 7 1439 1439
Input file: "../../filename.count.fa"
...
Here's what I want parsed out:
Cluster Unique Sequences Reads RPM
1 31 3539 3539
2 25 2797 2797
3 17 1679 1679
4 21 1636 1636
5 14 1568 1568
6 13 1548 1548
7 7 1439 1439
One-liner version
.say if /Cluster \s+ Unique/ ff^ /^\s*$/ for lines;
In English
Print every line from the input file starting with the once containing the phrase Cluster Unique and ending just before the next empty line.
Same code with comments
.say # print the default variable $_
if # do the previous action (.say) "if" the following term is true
/Cluster \s+ Unique/ # Match $_ if it contains "Cluster Unique"
ff^ # Flip-flop operator: true until preceding term becomes true
# false once the term after it becomes true
/^\s*$/ # Match $_ if it contains an empty line
for # Create a loop placing each element of the following list into $_
lines # Create a list of all of the lines in the file
; # End of statement
Expanded version
for lines() {
.say if (
$_ ~~ /Cluster \s+ Unique/ ff^ $_ ~~ /^\s*$/
)
}
lines() is like <> in perl5. Each line from each file listed on the command line is read in one at a time. Since this is in a for loop, each line is placed in the default variable $_.
say is like print except that it also appends a newline. When written with a starting ., it acts directly on the default variable $_.
$_ is the default variable, which in this case contains one line from the file.
~~ is the match operator that is comparing $_ with a regular expression.
// Create a regular expression between the two forward slashes
\s+ matches one or more spaces
ff is the flip-flop operator. It is false as long as the expression to its left is false. It becomes true when the expression to its left is evaluated as true. It becomes false when the expression to its right becomes true and is never evaluated as true again. In this case, if we used ^ff^ instead of ff^, then the header would not be included in the output.
When ^ comes before (or after) ff, it modifies ff so that it is also false the iteration that the expression to its left (or right) becomes true.
/^\*$/ matches an empty line
^ matches the beginning of a string
\s* matches zero or more spaces
$ matches the end of a string
By the way, the flip-flop operator in Perl 5 is .. when it is in a scalar context (it's the range operator in list context). But its features are not quite as rich as in Perl 6, of course.
I would like to do this using idiomatic Perl 6.
In Perl, the idiomatic way to locate a chunk in a file is to read the file in paragraph mode, then stop reading the file when you find the chunk you are interested in. If you are reading a 10GB file, and the chunk is found at the top of the file, it's inefficient to continue reading the rest of the file--much less perform an if test on every line in the file.
In Perl 6, you can read a paragraph at a time like this:
my $fname = 'data.txt';
my $infile = open(
$fname,
nl => "\n\n", #Set what perl considers the end of a line.
); #Removed die() per Brad Gilbert's comment.
for $infile.lines() -> $para {
if $para ~~ /^ 'Cluster Unique'/ {
say $para.chomp;
last; #Quit reading the file.
}
}
$infile.close;
# ^ Match start of string.
# 'Cluster Unique' By default, whitespace is insignificant in a perl6 regex. Quotes are one way to make whitespace significant.
However, in perl6 rakudo/moarVM the open() function does not read the nl argument correctly, so you currently can't set paragraph mode.
Also, there are certain idioms that are considered by some to be bad practice, like:
Postfix if statements, e.g. say 'hello' if $y == 0.
Relying on the implicit $_ variable in your code, e.g. .say
So, depending on what side of the fence you live on, that would be considered a bad practice in Perl.

How do I convert this "array of bytes" to PDF in ROR (Ruby)?

Over the web service, I am returned an array of bytes. Part of which looks like the following... How do I get this back to a file? (It started as a pdf)
rg1uje94ppbarWm6azwlDCJeHFFJuXlMN532v46qiyi2u/WNVHCgl10DFe64oZVSFKHN7pZ6qaulNHULZJjix33PWhzPLBVwcbptx5Husx+a7Y4q3T76KBu7pfjvXeav1emibcBSG2mMFakTv0Ho7LvYsVf57hzUq8ptL752worpSKa3L0s9IJ6Z6qIlFzDaXW4ml+3WCWvaHhUW2H+6xfFSuhjHzL8pKmd5t3aI8vsun16YY1VwLw9ivAGX+GUPRVBOTYpVqgLikJhKB7Fkpn5SJSATFAQGoviYGsw7A+B2hA0dpVlisUf0mvC2LjYwfEhcUPGmvwG3sRpGJkUPtzXWx+5a2UaTOtytnLR9qwFbXKf8s2DxS9dR/p+/rwjb9mr24p7E2e8e/ZWNP7dpX7V7xJWpLAxu67lOYhixHFPRZff6063L5q8yGXtOc/J/YP5sSev6l8trGk+c+WNXSa5+b7PfpqY/WJbkefxp4Xe5RfaHqx6oqU/o9ObBdjn3MDm3MzkvFmrvWaXfPavC9s6/8gZZdMeI3cPyp8n/nBSnpjXYUwelZlyKm+ek7Pl8YfhXM4c6uTwxhPyJvZscfRnzaSd7cwWLTs3zj8ucXWe7TGzR+NGXumfk7HqVXCAkrJVS/T+uNDXKHSh5viMpPuTzW+vXu7vIj7eOXLT47XX1vYynzBdcaGx1qo0qrEijTL81UcSZRrFwS7Zv72L/paRvgswpPVdNKe/Qq9hT2R/XQXC8De/HaGVqkC1rkqFIxCto1vzFn1+1xGpOgu+fG/I7P7NBiqm+Ri823b7edVvMEvoIuVLjvjJ7Mv3nTRcV2ZKn+CeR06xqGtHnfN6XVCyyiRx8d2DdxbM0Whz19Imd928mSGz9KpLbXZ0NZhaNX7e08BjbR4fsO+fcdZ7fnhMz0FN2rEnplApbV+aLRt/zHFc15fDpt3/6Kz77vjM+aGNgjJ/eaCpseryirwPdcHuovZPLr3sVRnp2XZwpwH5hwrK0u3vB
Ive tried a few things, the closest is as follows (though I am unsure by the output if it is correct):
File.open(pdf_filename, 'w' ) do |output|
byteArray.each_byte do | byte |
output.print byte
puts byte
end
end
which returns in the console the following but does not create a valid file ( I assume these numbers are the bytes in Integer (base10)form or something?) :
77
52
79
89
57
etc..
I am no expert.. I am learning ruby myself at the moment (looking at questions on SO to vary techniques a bit ;-)
but have you tried:
File.open(pdf_filename, 'wb' ) do |output|
byteArray.each_byte do | byte |
output.print byte
puts byte
end
end
or maybe even (I really don't know if that will work) I don't have Ruby installed here to test:
File.open(pdf_filename, 'wb') { |output|
output << byteArray
}
I got this info from here (among other places):
http://strugglingwithruby.blogspot.com/2008/11/ruby-file-access.html
Binaries files are just the same; you just add a b to the second parameter of the open method.
Depending on your byte array format, you may need to use the unpack method.
File.open(pdf_filename, 'wb' ) do |output|
output << byteArray.unpack("m")
end
See the following for possible parameters in the unpack method:
http://www.codeweblog.com/ruby-string-pack-unpack-detailed-usage/

Resources