Parsing a Large Text in Sections in Matlab - parsing

I have a large text file as below imported in MATLAB:
Run Lat Long Time
1 32 32 34
1 23 22 21
2 23 12 11
2 11 11 11
2 33 11 12
up to 10 runs etc.
So I'm trying to break up each section in the file: section 1, section 2, etc and write it to 10 different text files. File 1 will have data from Run 1. File 2 will have data from Run 2.

What you're looking for is Matlab's textread function. I'll give you the pieces you need and frame out the logic, but you'll need to connect the pieces yourself :)
Your read would look something like this
[head1, head2, head3, head4] = textread(file_name,'%s %s %s %s',1);
[run, lat, long, time] = textread(file_name,'%u %u %u %u');
and your write method would use a loop to iterate over the values in
unique(run)
creating a file with
fout = fopen([base_file_name_out num2str(run_number)]);
and writing to it the values contained in
lat_this_run=Lat(run==run_number);
using the method
fprintf(fout,'%u %u %u\n', lat_this_run, long_this_run, time_this_run)

If your data is already loaded into matlab and named A, you could do:
>> a = max(A(:,1));
>> AA={};
>> for i = 1:a
AA{i}=A(find(A(:,1)==i),:)
name=sprintf('%d.txt',i);
dlmwrite(name,AA{i},'\t');
end
The output will be .txt files containing tab-delimited data.

Related

missing data in time series

As im so new to this field and im trying to explore the data for a time series, and find the missing values and count them and study a distribution of their length and fill in these gaps, the thing is i have, let's say 10 file.txt and for each file i have 2 columns as follows:
C1 C2
944 0
920 1
920 2
928 3
912 7
920 8
920 9
880 10
888 11
920 12
944 13
and so on... lets say till 100 and not necessarily the 10 files have the same number of observations.
so here for example the missing values and not necessarily appears in all files that i have, missing value are: 4,5 and 6 in C2 and the corresponding 1st column C1(measured in milliseconds, so the value of 928ms is not a time neighbor of 912ms). So i want to find those gaps(the total missing values in all 10 files) and show a histogram of their lengths.
i wrote a piece of code in R, but the problem is that i don't get the exact total number that i should have for the missing values.
path = "files path"
out.file<-data.frame(TS = 0, Index = 0, File = '')
file.names <- dir(path, pattern =".txt")
for(i in 1:length(file.names)){
file <- cbind(read.table(file.names[i],
header=F,
sep ="\t",
stringsAsFactors=FALSE),
file.names[i])
colnames(file) <- c('TS', 'Index', 'File')
out.file <- rbind(out.file, file)
}
d = dim(out.file)[1]
misDa = 0
for(i in 2:(d-1)){
if(abs(out.file$Index[i]-out.file$Index[i+1]) > 1)
misDa = misDa+1
}
Hard to give specific hints without having a more extensive example of your data that contains some of the actual NAs.
If you are using R (like it seems) the naniar and the imputeTS packages offer nice functions for missing data visualizations.
Some examples from the naniar package, which is especially good for multivariate data (more plot examples):
Some examples from the imputeTS package, which is especially good for time series data (additional plot examples):

Comparing files based on two columns

I have two files with thousands of lines:
file1:
COL22A1 LCT 1 12 0.149667616334 2.16226378401
GPRIN2 TP53 12 170 0.0455368539793 44.2359753827
MUC3A TP53 12 170 0.0455368539793 44.2359753827
file2:
COL22A1 LCT 12 41 23 0.0296296296296 0.101234567901 0.0567901234568 2.36563
MEGF10 SORCS1 10 21 39 0.0246913580247 0.0518518518519 0.0962962962963 2.30599
I want to compare first two columns of these files and if they match I want to print whole line of second file and last column of first file:
output:
COL22A1 LCT 12 41 23 0.0296296296296 0.101234567901 0.0567901234568 2.36563 2.16226378401
I tried awk, grep, join but it always gives me output of just one file
Could you please try following and let us know then.
awk 'FNR==NR{a[$1,$2]=$NF;next} a[$1,$2]{print $0,a[$1,$2]}' Input_file1 Input_file2

Octave thinks, that image I want to imread() doesn't exist

I'm writing a function in Octave to easily add particles on an image, but I have a problem.
function [ out ] = enparticle( mainImg, particleNames, particleData, frames, fpp, sFrame, eFrame )
%particleData format:
% [ p1Xline p1StartHeight p1EndHeight;
% p2Xline p2StartHeight p2EndHeight;
% p3Xline p3StartHeight p3EndHeight;
% ... ]
%particleNames format:
% [ p1Name;
% p2Name;
% p3Name;
% ... ]
pAmount = size(particleData, 1);
for i= 1:pAmount
tmp = particleNames(i,:)
[ pIMG pMAP pALPHA ] = imread( tmp );
end
end
When I run this simple code with
enparticle( "ffield.png", [ "p_dot.png"; "p_star.png"; "p_dot.png" ], [ 100 50 100; 200 50 100; 300 50 100 ], 30, 10, 5, 25 )
I get this written in console
tmp = p_dot.png
error: imread: unable to find file p_dot.png
error: called from
imageIO at line 71 column 7
imread at line 106 column 30
enparticle at line 24 column 23
When I try to imread() file this way, Octave thinks, that there is no file named like this. But it is actually. In the same folder as script file.
The most curious thing is that, when I change
tmp = particleNames(i,:)
to
tmp = particleNames(:,:)
and Octave assigns all names to tmp as array, it magically find all the files with passed names.
But it's not the way I want it to work, because all files will be replaced, or merged, or sth along image processing then.
Why I'm trying to do it that way is corelated with fact, that I want to put every frame (of image and alpha) separately into a cell array later.
I totally don't have any clue, about what I do wrong there and can't google it anywhere also :(
The code:
filenames = [ "p_dot.png"; "p_star.png"; "p_dot.png" ]
does not do what you think it does. This will create a 2 dimensional
array of characters. See
octave> size (filenames)
ans =
3 10
Of interest note is that it lists 10 columns. Take a look at your
filenames and you will notice that their file names are of different
lengths, two have length 9 and one has length 10. But this is just
like a numeric matrix, the only difference is you have ascii
characters. And like a numeric matrix, all rows must have the length.
What is happening, is that the shortest rows get padded with
whitespace. You can confirm this by checking the ascii decimal code
of your filenames:
octave> int8 (filenames)
ans =
112 95 100 111 116 46 112 110 103 32
112 95 115 116 97 114 46 112 110 103
112 95 100 111 116 46 112 110 103 32
Note how the first and third row end in '32'. In ASCII, that's the
space character (see the wikipedia article about
ASCII which has the tables)
So the problem is not that imread does not find a file named
'p_dot.png', the problem is that it does not find a file named
'p_dot.png '.
You should not be using character arrays for this. Instead, use a
cell array. In a cell array of char arrays. Like so:
filenames = {"p_dot.png", "p_start.png", "p_dot.png"}
for i = 1:numel (filenames)
fname = filenames{i};
[pIMG, pMAP, pALPHA] = imread (fname);
## do some stuff
endfor

How to use F# TypeProvider to read PowerBall csv?

The powerball schema and separators are not consistent which makes it an unusual file to read. (http://www.powerball.com/powerball/winnums-text.txt)
Sample:
Draw Date WB1 WB2 WB3 WB4 WB5 PB PP
09/24/2016 15 07 29 41 20 22 2
09/21/2016 63 67 01 69 28 17 4
09/17/2016 51 19 09 62 55 14 4
Any suggestions?
This looks like a "fixed column width" file rather than an ordinary CSV (meaning that the columns are not separated by any single character, but instead have fixed number of characters, with padding spaces).
There is some early work on supporting this in F# Data in the pull request here. We'd welcome any help getting this tested - but you'd need to get the soruce code and build F# Data from source (which is just a matter of running the build script though!)
Alternatively, you could probably do some simple pre-processing on the file before reading it as an ordinary CSV file. Looking at the sample file, using a regular expression to replace 1 or more spaces with a comma would produce regular CSV that the CSV provider can consume.

xcode : retrieving one line of xcode based on search query

Here is a sample of my CSV
10820 0 0 0 0
10900 2 4 4 4
11000 21 50 54 58
11100 23 54 59 63
11200 25 59 63 68
11300 27 63 68 73
11400 29 68 73 78
11500 31 72 78 83
11600 32 76 82 88
11700 34 81 87 93
I'm looking to create to use xcode to retreive one line of code from this very lengthy CSV based on the first line.
For example:
if the user enters "10900", the second line columns will be returned.
If the user returns 11650, the 11600 line columns will be returned...always taking the lower line when the input value is less then the following line.
Any help would be appreciated. I've seen code to parse an entire CSV file, but I'm thinking this may be a big memory drain, right now my CSV has 2000 lines of values, which are all in ascending order based on the first column.
You have to load a file into memory anyways to find correct value.
With such a big CSV file I would recommend to turn CSV file into binary file (plist file for example) and put it as binary into your application - instead of parsing it each time in RunTime. It has much better performance and it's easier to work with that since you are working directly with NSDictonaries an NSArrays.
If you don't want to do it for some reason, the next solution is to use something like CHCSVParser:
https://github.com/davedelong/CHCSVParser
It provides optimization for loading only part of file at a time - which is the optimization you might be looking for.

Resources