Merge tab delimited text files into a single file

Merge tab delimited text files into a single file - join

What is the easiest method for joining/merging all files in a folder (tab delimited) into a single file? They all share a unique column (primary key). Actually, I only need to combine a certain column and link on this primary key, so the output file would contain a new column for each file. Ex:
KEY# Ratio1 Ratio2 Ratio3
1 5.1 4.4 3.3
2 1.2 2.3 3.2
etc....
There are many other columns in each file that I don't need to combine in the output file, I just need these "ratio" columns linked by the unique key column.
I am running OS X Snow Leopard but have access to a few Linux machines.

use the join(1) utility

I actually spent some time learning Perl and solved the issue on my own. I figured I'd share the source code if anyone has a similar problem to solve.
#!/usr/bin/perl -w
#File: combine_all.pl
#Description: This program will combine the rates from all "gff" files in the current directory.
use Cwd; #provides current working directory related functions
my(#handles);
print "Process starting... Please wait this may take a few minutes...\n";
unlink"_combined.out"; #this will remove the file if it exists
for(<./*.gff>){
#file = split("_",$_);
push(#files, substr($file[0], 2));
open($handles[#handles],$_);
}
open(OUTFILE,">_combined.out");
foreach (#files){
print OUTFILE"$_" . "\t";
}
#print OUTFILE"\n";
my$continue=1;
while($continue){
$continue=0;
for my$op(#handles){
if($_=readline($op)){
my#col=split;
if($col[8]) {
$gibberish=0;
$col[3]+=0;
$key = $col[3];
$col[5]+=0; #otherwise you print nothing
$col[5] = sprintf("%.2f", $col[5]);
print OUTFILE"$col[5]\t";
$continue=1;
} else {
$key = "\t";
$continue=1;
$gibberish=1;
}
}else{
#do nothing
}
}
if($continue != 0 && $gibberish != 1) {
print OUTFILE"$key\n";
} else {
print OUTFILE"\n";
}
}
undef#handles; #closes all files
close(OUTFILE);
print "Process Complete! The output file is located in the current directory with the filename: _combined.out\n";

Related

ImageJ/Fiji - Save CSV using macro

I am not a coder but trying to turn ThunderSTORM's batch process into an automated one where I have a single input folder and a single output folder.
input_directory = newArray("C:\\Users\\me\\Desktop\\Images");
output_directory = ("C:\\Users\\me\\Desktop\\Results");
for(i = 0; i < input_directory.length; i++) {
open(input_directory[i]);
originalName = getTitle();
originalNameWithoutExt = replace( originalName , ".tif" , "" );
fileName = originalNameWithoutExt;
run("Run analysis", "filter=[Wavelet filter (B-Spline)] scale=2.0 order=3 detector "+
"detector=[Local maximum] connectivity=8-neighbourhood threshold=std(Wave.F1) "+
"estimator=[PSF: Integrated Gaussian] sigma=1.6 method=[Weighted Least squares] fitradius=3 mfaenabled=false "+
"renderer=[Averaged shifted histograms] magnification=5.0 colorizez=true shifts=2 "+
"repaint=50 threed=false");
saveAs(fileName+"_Results", output_directory);
}
This probably looks like a huge mess but the original batch file used arrays and I can't figure out what that is. Taking it out brakes it so I left it in. The main issues I have revolve around the saveAs part not working.
Using run("Export Results") works but I need to manually pick a location and file name. I tried to set this up to take the file name and rename it to the generic image name so it can save a CSV using that name.
Any help pointing out why I'm a moron? I would also love to only open one file at a time (this opens them all) and close it when the analysis is complete. But I will settle for that happening on a different day if I can just manage to save the damn CSV automatically.
For the most part, I broke the code a whole bunch of times but it's in a working condition like this.
I appreciate any and all help. Thank you!

Run Macro in PhpSpreadsheet

I'm doing a project that allow the customer to export the mysql data into .xls form. I'm using phpspreadsheet library.
That's done, but in my data contain lots of date, some of the date is 0000-00-00 means that it is not used.
I wanted to filter all of these '0000-00-00' into '-'.
I uses excel find and replace and save them as macro ( .bas )
What i have tried is
load the .bas file with IOFactory and reader in php, but it say the file format is not accepted
use substitute method in php loops that use to get the sql data value
$activeSheet->setCellValue('L'.$i, '=substitute('L'.$i ,"0000-00-00", "-')');
$i is 1 that will increase by 1 for each loop
This method failed when the i can't include the $i inside the substitute() because the of "" and
'' problem, I tried to change them around, but seem like the 0000-00-00 and - must use "", if
not the method is not recognise by the library that makes the $i can't be detect then...
Is there any way to solve any of these problems? or it can't be solve in the first place?
cause i can't found any explanation of macro in phpspreadsheet from community nor google.

When setting the value of the cell
if ($datefromselect == '0000-00-00') {
$activeSheet->setCellValueByColumnAndRow($colnum, $rownum, '-');
} else {
$activeSheet->setCellValueByColumnAndRow($colnum, $rownum, $datefromselect);
}
or get it done in the select as in
SELECT lastname,
if(date_closed = '0000-00-00', '-', date_closed)
FROM `lca_clients`

How to validate files for multiple extension name checking in ruby or ruby on rails

Am trying to check file for multiple extension name
To check character length, I can do it like below
s= 'hello welcome to stackoverflow'
if s.length <= 35
print('okay')
else
print('not accepted')
end
What i want to achieve:
Now I have a file that i want to check for single, double or multiple extension name.
I only want to allow file names with single extension name. if files has more than one extension name throw error not allowed as can be seen in the code below.
My issue is that I cannot get dot() function equivalent of length() function
filename = 'nancy.png'
if filename.length == 1
print('good file because it has only one dot extension name')
else
print('files with two or multiple extension name not allowed')
end

You can simply count the '.' if it occurs more than once means its invalid. You can do the following for that.
if filename.count('.') > 1 # assuming filename is string
print 'Invalid'
else
print 'Valid'
end

Require json file dynamically in react-native (from thousands of files)

I googled so far and tried to find out the solution but not yet.
I know require() works only with static path, so I want alternative ways to solve my problem. I found this answer here but it doesnt make sense for thousands of resources.
Please advise me the best approach to handle such case.
Background
I have thousand of json files that containing app data, and declared all the file path dynamically like below:
export var SRC_PATH = {
bible_version_inv: {
"kjv-ot": "data/bibles/Bible_KJV_OT_%s.txt",
"kjv-nt": "data/bibles/Bible_KJV_NT_%s.txt",
"lct-ot": "data/bibles/Bible_LCT_OT_%s.txt",
"lct-nt": "data/bibles/Bible_LCT_NT_%s.txt",
"leb": "data/bibles/leb_%s.txt",
"net": "data/bibles/net_%s.txt",
"bhs": "data/bibles/bhs_%s.txt",
"n1904": "data/bibles/na_%s.txt",
.....
"esv": "data/bibles/esv_%s.txt",
.....
},
....
As you can see, file path contains '%s' and that should be replace with right string depends on what the user selected.
For example if user select the bible (abbreviation: "kjv-ot") and the chapter 1 then the file named "data/bibles/Bible_KJV_OT_01.txt" should be imported.
I'm not good enough in react-native, just wondering if there is other alternative way to handle those thousands of resource files and require only one at a time by dynamically following the user's selection.
Any suggestions please.

Instead of exporting a flat file, you could export a function that took a parameter which would help build out the paths like this:
// fileInclude.js
export const generateSourcePath = (sub) => {
return {
bible_version_inv: {
"kjv-ot": `data/bibles/Bible_KJV_OT_${sub}.txt`
}
}
}
//usingFile.js
const generation = require('./fileInclude.js');
const myFile = generation.generateSourcePath('mySub');
const requiredFile = require(myFile);
then you would import (or require) this item into your project, execute generateSourcePath('mysub') to get all your paths.

Where can I find a large tabbed hierarchical data set for parser testing?

First, apologies as I realize this is only tangentially related to parser programming.
I've spend hours looking for a text file containing something like the following but with hundreds (hopefully thousands) of sub-entries. A complete biological classification file would be perfect. A massive version of the following would be great as my parser parses simple tabbed files:
TL,DR - I need a massive single-file hierarchical data set something like the following:
Kindoms
Monera
Protista
Fungi
Plants
Animals
Porifera
Sponges
Coelenterates
Hydra
Coral
Jellyfish
Platyhelminthes
Flatworms
Flukes
Nematodes
Roundworms
Tapeworms
Chordates
Urochordataes
Cephalochordates
Vertebrates
Fish
Amphibians
Reptiles
Birds
Mammals
The best I've been able to find are tree-of-life images (from which I transcribed the sample data set above). A single file with a TON of real data would be awesome. It doesn't have to be a biological classification data set, but I would really like the data to reflect something in the real-world. (My parser feeds a menu - would be great if the remainder of my testing was with a data set that actually meant something!) Even if the file is not tabbed but the data was fairly easily regex'ed to a tabbed format... that would be great.
Any ideas? Thanks!

It is possible that the xml layout was changed since the last answer but the code submitted above is no longer accurate. The resulting dump is extraneous. Some of the nodes have aliases (denoted as 'othername') that are reported as distinct nodes themselves.
I used the script below to generate the correct dump.
<?php
$reader = new XMLReader();
$reader->open('http://tolweb.org/onlinecontributors/app?service=external&page=xml/TreeStructureService&node_id=1'); //15963 is the primates index
$set=-1;
while ($reader->read()) {
switch ($reader->nodeType) {
case (XMLREADER::ELEMENT):
if ($reader->name == "OTHERNAMES"){
$set=1;
}
if ($reader->name == "NODES"){
$set=-1;
}
if ($reader->name == "NODE"){
$set=-1;
}
if ($reader->name == "NAME" AND $set == -1){
echo str_repeat("\t", $reader->depth - 2); //repeat tabs for depth
$node = $reader->expand();
echo $node->textContent . "\n";
}
break;
}
}
?>

This turned out to be such a pain in the ass. I finally tracked down a data feed from "The Tree of Life Web Project" at tolweb.org. I made the php script below to provide the basic functionality my post was looking for.
Change the node_id to have it print a tabbed representation of any of tolweb.org's data - just take the id from the page you're browsing on their site and change the node_id below.
Be aware though - their data feeds serve up large files, so definitely download the file to your own server (and change the "open" method below to point to the local file) if you're going to hit it more than once or twice.
More info on tolweb.org data feeds can be found here:
http://tolweb.org/tree/home.pages/downloadtree.html
<?php
$reader = new XMLReader();
$reader->open('http://tolweb.org/onlinecontributors/app?service=external&page=xml/TreeStructureService&node_id=15963'); //15963 is the primates index
while ($reader->read()) {
switch ($reader->nodeType) {
case (XMLREADER::ELEMENT):
if ($reader->name == "NAME"){
echo str_repeat("\t", $reader->depth - 2); //repeat tabs for depth
$node = $reader->expand();
echo $node->textContent . "\n";
}
break;
}
}
?>

Develop Reference

ios ruby-on-rails asp.net-mvc docker delphi jenkins grails google-sheets machine-learning dart

Merge tab delimited text files into a single file - join

use the join(1) utility

Related

ImageJ/Fiji - Save CSV using macro

Run Macro in PhpSpreadsheet

How to validate files for multiple extension name checking in ruby or ruby on rails

Require json file dynamically in react-native (from thousands of files)

Where can I find a large tabbed hierarchical data set for parser testing?

Categories

Resources