POWERSHELL: Drop X number of beginning characters in a file - parsing

I have a vendor-propriety files that I am converting to csv. I need to delete the first 7 characters of each file. These characters are a mix of printable and non-printable characters.
For example, the one file might have
$([char]0x56)$([char]0x28)$([char]0x00)$([char]0x00)$([char]0x4C)$([char]0x01)$([char]0x01)
And the next file might have
$([char]0x4F)$([char]0xE7)$([char]0x00)$([char]0x00)$([char]0x4C)$([char]0x01)$([char]0x01)
And the next file might have something completely different.

Even simpler:
(Get-Content <CSV file path> | Out-String).Substring(7) | Out-File <CSV file path>
To do this for all CSV files in a directory:
gci <path to directory>\*.csv | (Get-Content $_ | Out-String).Substring(7) | Out-File $_

Related

Using grep to extract reads from nanopore fastq files

I'm trying to extract the a specific sequence from a fastq file using grep to search the sequence ID
less all_barcode03.fastq.gz
#3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t runid=7204dc15205b93bfd6430ca0f3a0218f11ce0787 read=10 ch=120 start_time=2019-04-12T13:55:25Z
TCGGTAGCCACTTCGTTCAGTCAATTTGGGTTGTTTAACCGAGTCTTGTGTGTCCCAGTTACCAGGGTTTTCGCATTTATCGTGAAACGCTTTCGCGTTTTCGTGCGCCGCTTCAGTGATCAGTGAAGATGGGTTTGTGGTGGAATACTCTTGCTGCTCATGGCAAACTTTATGTTGGTTTTCTCATGCATTTGTTTCTCGTAATCCCATACGTCATCCAAAGTCATCTGAAAAAGAGGGAAGGGGTGGATTGTGGGTGAAATGTTGTGTACTCCTCTATAATGGGGCTCAGTTGACAAACAGGTGGAGGAGAGGATCATTTGCTTAAAGGGGTGAGTGAAGCGGAGTTTAAGGATAATTCAAGCTTTTAAAAGTGGCTTTAGAGGTAAAGGGTTAGCTCCCATGACCCACAGGATTTATAGGAGATGGCTCTGAACAAACCAGAGCCACACACACA
+
-%&&($$%%#%,-*),-5(&,$$%$%+).'-(+-4-(')%%$*+-,3...14,7/))/03.06-./-3:8.0(*,/7+*,966006.,(*(,-(&(*,./+--902/./),,,0,-/./,4(+0/,0).0-7048,(+*',*/.)*#(((.0--10764+('(%.3/+$&%&'./4'0.;:6.895778+0/*(28/),(+-/404/*'(),.16517&83+*/0/0.--033**$&'*,''*/,,,/..0.*0*0$##*((($/6&('-,.230/01/2+4,,::8719(*.4.'.26/0(*))0*+,(*+-,-+-.4765-$%&.'%.*/')(&''#-()*21,-.;+3).*,,'557686+(-7;-2:8))(&%%'*)**%&&).6&,*(.-'$'(*2+*0587:0+*+)/*/63--/*('#&)-68664&%534)/13.))'14*+**%%$$#
#69e7e435-a78c-4ec8-94cd-b0c1f3c40c11_t runid=7204dc15205b93bfd6430ca0f3a0218f11ce0787 read=15 ch=465 start_time=2019-04-12T13:55:25Z
TCGGTACTTCGTTCGGTTGGAGAAGGTGGTGTTGCCGAGTCTTGTGTCCCAGTTACCAGGGTTTTCGCATTTATCGTGGCTTGCTGCGTTTTCGTGCGCCACCGCTTCATGTGTGTGTGTGTGTCTGGTGTTATTACTCACTTGGCAAGCGTGTCTGGACAGCAGCTGTTTGAGTGTTGAGAGCGCTTCTTCTCCAGGAGAAGCGGTTGAGCCTAAGCTGAATCCCCGTCCGTCTTTATCTTCGGACATGCTCTGGATATGCCTGAGGAGGACAATGGAGGAACAGAACAGATGGATGAAGAGCTCATAAAACTGGCACACATGCATCAAAGCCCACCTTCGTCACTCTGATGACCAGTGACTGCCGTTTATTACTGCGATTTACCATGAAGTTATCTGCTTTTTGGGTCAGTTAGTGTGTGTGTGTGTGTGTGTGTGTGTGCCTTTTCTGTCCTCCAGATACTCAGTACTACAGAGGAGCTATTAATACTTACTACATCGATATGTTATGTAATATCATTCTAGCCTGCTACTCCTGTCTTCTGTATACAACTGTCGTCTGTCCCGAATAGCTCCTGGGTGCCCTCTCCTCCATAGTAGCCACAGTTACAGGAATATTACTCTTTATCATAGAAGCGGTATCTAGTAGAACAGTCCTTAGTTAAAATAATAACGGGGTGTGGGCATGTACAGCCTCTGGTATTCCGTTGCTCAGCAGAGCCTCATAACTCTCCTAGTGGCTCAGGAAGGCTGAAACAGGCTGTGTGCACCCAGCCAGCTGGAACTGTGTTTGAGTGCCATCTTGGAATACTGTTTATAAGCGCTCTTAAGTTATATGTGAGGATGGTGGTATTAGATATGGAAGTGTGTAGGAGGAGAAAGAGGAAATAGTGTCATGTTGATATGAACAGTTTGGTCAGTAAAATGAGGGCAGTAAAAAAGTGTTTTAAGCGTTTTGTCGGTCGACAATATGATAATAAAATGCATTTGGTTCACGATAACAAGAAAACAGAAAAGACCAGCAATGAATATTTAGCATTTTTTGTTTGAAAGATGAAACAAATAATTGAAATAGCTGCCAAATATTTGTGAAATGTACTAAATGGTCAGAGTGAAGATGCAGCTTTGAAAAGAAGATTCGGA
+

Then try to show one of the sequences by searching for the sequence ID:
grep '#3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t' all_barcode03.fastq.gz
grep '*#3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t*' all_barcode03.fastq.gz
grep #3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t all_barcode03.fastq.gz
grep *#3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t* all_barcode03.fastq.gz
All the above grep commands return no results however there is a line in the file staring with #3cb04ae7-2c7b-4da8-8d09-59edb5b8f45c_t
Use zgrep not grep on .gz files.
zgrep - search possibly compressed files for a regular expression

How to sort the output of recursive “grep -lr” chronologically by newest modification date last?

I want to get a list of all files, in the current directory or any subdirectory, containing a certain string sorted by modification date.
I am having trouble getting the answer to
How to sort the output of "grep -l" chronologically by newest modification date last?
to work for the purpose of a recursive grep search. How do I obtain such a ordered list such that all files that would be found by grep -lr are really included.
Assuming your file names don't contain newlines:
find dir -type f -printf '%T#\t%p\n' | sort | cut -f2- | xargs grep -l whatever
More robustly using GNU versions of the tools to deal with dir/file names containing exoctic characters:
find dir -type f -printf '%T#\t%p\0' | sort -z | cut -z -f2- | xargs -0 grep -l whatever

PowerShell copy random files

I am trying to copy 7 random .txt files to a different location, but sub-folders get copied instead of the .txt files.
Here is my script:
$d = #(gci G:\Users\Public\Test) | resolve-path | get-random -count 2
$d | gci | get-random -count 7
Copy-Item $d -destination G:\Users\Public\Videos
What do I need to change?
One possible solution might be to use the PSIsContainer attribute to filter out folders.
I tried the following...
$d = gci "C:\Work\a\*.txt" | Where {$_.psIsContainer -eq $false}| resolve-path | get-random -count 7
Copy-Item $d -destination C:\Work\b
The where clause filtered out anything that was not a "container" and ignored the test folders I had set up. If you need .txt files specifically then use the wildcard included in the path as above.
Also, if you were to add -recurse then it would presumably search in all sub folders of your original search location and still filter out any "folders" for copying. Though I haven't tested this very thoroughly.
$d = gci "C:\Work\a\*.txt" -recurse | Where {$_.psIsContainer -eq $false}| resolve-path | get-random -count 7

How to select all files of a given filetype EXCEPT ones matching a name pattern?

I'm trying to select all files of a certain type in a given directory EXCEPT ones beginning with certain names. Why didn't this code work?
PS C:\Documents and Settings\wdennis> Get-Item -Path ($AppDir + "reports\*.dbf") | Where-Object {$_.Name -ne "reports*" -or "category*"}
Directory: C:\Program Files\Application\reports
Mode LastWriteTime Length Name
---- ------------- ------ ----
----- 1/4/2007 9:37 AM 4842 category.dbf
----- 9/7/2007 1:53 PM 43903 reports.dbf
I'm pretty new to PS, and very tired to boot, so maybe that's why I'm not understanding why this didn't work. How to do this?
I think that -eq and -ne match the given string and don't support wildcards.
Only -like supports wildcards for pattern matching.
You can however use a regular expression with the -notmatch switch to achieve what you want. Since it's a regular expression now you need to use .* instead of *. And the beginning is marked with ^.
So you end with this
{$_.Name -notmatch "^reports.*|^category.*"}
The whole command
Get-Item -Path ($AppDir + "reports\*.dbf") | Where-Object {$_.Name -notmatch "^reports.*|^category.*"}

Grep in multiple files prints matches line with file name

I'm using grep to found matching lines from a file in two different files. It finds the matching files just fine from File1 into File2 and File3, but from the moment there is more than one file, it prints the file name in which it was found next to the line.
grep -w -f File1 File2 File3
Output:
File2: pattern
File2: pattern
File3: pattern
Is there an option to avoid the print of File2: and File3:?
grep --no-filename -w -f File1 File2 File3
If you're on a UNIX system, please refer to the man pages. Whenever you encounter a problem, your first step should be man $programName. In this case, man grep. It appears that you want the "-h" option. Here's an excerpt from the man page:
-h, --no-filename
Suppress the prefixing of file names on output. This is the default when there is only one file (or only standard input) to search.

Resources