Below there is a very useful command to convert your file to .kml file so to plot it in google earth.
output = ge_plot(data(:,2),data(:,1));
ge_output('name.kml',output);
But this works only for one file per time.Lets say i have n files so i want to create n .kml files.
I have changed the first line of the code to this:
for i=1:n
each_traj{i} = out(:,:,i);
output{i} = ge_plot(each_traj{1,i}(:,2),each_traj{1,i}(:,1));
end
And it works. So I have a n dimentional "each_traj" with the information in every cell.
What about the second line?
ge_output('traj1.kml',output{1,i})
I want to save n traj at the same time, with different names each, of course.
Thank you a lot!
I found a solution. If any other interested with the same problem here is the sub-code
for i=1:count_cases
each_traj{i} = out(:,:,i);
kml_line(each_traj{1,i}(:,2), each_traj{1,i}(:,1),sprintf('traj%d', i),'w', 6);
end
, where each_traj, put your data file
Related
I am very new to ruby and I want to check for rows with the same number in a csv file.
What I am trying to do is go through the input csv file and copy element from the input file to the output file also adding another column called "duplicate" to the output file, then check if a similar phone is already in the output file while copying data from input to output then if the phone already exist, add "dupl" to the row in the duplicate column.
This is what I have.
file=CSV.read('input_file.csv')
output_file=File.open("output2.csv","w")
for row in file
output_file.write(row)
output_file.write("\n")
end
output_file.close
Example:
Phone
(202) 221-1323
(201) 321-0243
(202) 221-1323
(310) 343-4923
output file
Phone
Duplicate
(202) 221-1323
(201) 321-0243
(202) 221-1323
dupl
(310) 343-4923
So basically you want to write the input to output and append a "dupl" on the second occurrence of a duplicate?
Your input to output seems fine. To get the "dupl" flag, simply count the occurrence of each number in the list. If it's more than one, its a duplicate. But since you only want the flag to be shown on the second occurrence just count how often the number appeared up until that point:
lines = CSV.read('input_file.csv')
lines.each_with_index do |l,i|
output_file.write(l + ",")
if lines.take(i).count(l) >= 1
output_file.write("dupl")
end
output_file.write("\n")
end
l is the current line. take(i) is all lines before but not including the current line and count(l) applied to this counts how often the number appeared before if it's more than one, print a "dupl"
There probably is a more efficient answer to this, this is just a quick and easy to understand version.
To explain my problem I use this example data set:
SampleID Date Project Problem
03D00173 03-Dec-2010 1,00
03D00173 03-Dec-2010 1,00
03D00173 28-Sep-2009 YNTRAD
03D00173 28-Sep-2009 YNTRAD
Now, the problem is that I need to replace the text "YNTRAD" with "YNTRAD_PILOT" but only for the cases with Date = 28-Sep-2009.
This is example is part of a much larger database, with many more cases having Project=YNTRAD and Data=28-Sep-2009, so I can not simply select first all cases with 28-Sep-2009, then check which of these cases have Project=YNTRAD and then replace. Instead, what I need to do is:
Look at each case that has a 1,00 in Problem (these are problem
cases)
Then find the SampleID that corresponds with that sample
Then find all other cases with the same SampleID BUT WITH
Date=28-Sep-2009 (this is needed because only those samples are part
of a pilot study) and then replace YNTRAD in Project to
YNTRAD_PILOT.
I read a lot about:
LOOP
- DO REPEAT
- DO IF
but I don't know how to use these in solving this problem.
I first tried making a list containing only the sample ID's that need eventually to be changed (again, this is part of a much larger database).
STRING SampleID2 (A20).
IF (Problem=1) SampleID2=SampleID.
EXECUTE.
AGGREGATE
/OUTFILE=*
/BREAK=SampleID2
/n_SampleID2=N.
This gives a dataset with only the SampleID's for which a change should be made. However I don't know how to read out this dataset case by case and looking up each SampleID in the overall file with all the date and then change only those cases were Date = 28-Sep-2009.
It sounds like once we can identify the IDs that need to be changed we've done the tricky part here. We can use AGGREGATE with MODE=ADDVARIABLES to add a problem Id counter variable to our dataset. From there, it's as you'd expect.
* Add var IdProblemCnt to your database . Stores # of times a given Id had a record with Problem = 1.
AGGREGATE
/OUTFILE=* MODE=ADDVARIABLES
/BREAK=SampleId
/IdProblemCnt=CIN(Problem, 1, 1) .
EXE .
* once we've identified the "problem" Ids we can use `RECODE` Project var.
DO IF (IdProblemCnt>0 AND Date = DATE.MDY(9,28,2009) .
RECODE Project ('YNTRAD' = 'YNTRAD_PILOT') .
END IF .
EXE .
I have a large file (200K - 300K lines of text).
It's almost but not quite a CSV file.
The column headers are on the second row, there's a row of dummy text
before that.
There are rows interspersed with the actual data rows. They have
commas, but most of the columns are blank. They aren't relevant to me.
I need to read this file efficiently, and parse the lines that actually are
valid, as CSV data.
My first idea was to write a clean procedure that strips out the first line, and the blank lines, leaving only the headers and details that I want
in a CSV File that the CsvParser can read.
This is easy enough, just ReadLine from a StreamReader, I can keep or disregard each line just by looking at it as a string.
Now though I have a new issue.
There is a column in the valid data that I can use to disregard a whole lot more rows.
If I read the Cleaned file using the CsvParser it's easy to filter by that column.
But, I don't really want to waste writing the rows I don't need to the Clean file.
I'd like to be able to check that Column, while Cleaning the File. But, at that point I'm working with strings representing entire lines. It's not easy to get at the specific column I want.
I can't Split on ',' there may be commas in the text of other columns.
I'm ending up writing the Csv Parsing Logic, that I was using CsvParser for in the first place.
Ideally, I'd like to read in the existing file, clean out the lines that I can based on strings, then somehow parse the resulting seq using the CsvParser.
I see CsvFile can Load from Streams and Readers, but I'm not sure that's much help.
Any suggestions or am I just asking too much? Should I just deal with the extra filtering on loading the Cleaned File?
You can avoid doing most of the work of parsing by using the CsvFile class directly.
The F# Data documentation has some extended examples that show how to do this in some detail.
Skipping over lines at the start of a file is handled by the skipRows parameter. Passing the ignoreErrors parameter will also ignore rows that fail to parse.
open FSharp.Data
let csv = CsvFile.Load(file, skipRows=1, ignoreErrors=true)
for row in csv.Rows do
printfn "%s" row.GetColumn "Name"
If you have to do more complex filtering of rows, a simple approach that doesn't require temporary files is to filter the results of File.ReadLines and pass that to CsvFile.Parse.
The example below skips a six-line prelude, reads in lines until it hits a blank line, uses CsvFile to parse the data, and finally filters the resulting rows to those of interest.
let tableA =
File.ReadLines(file)
|> Seq.skip(6)
|> Seq.takeWhile(fun l -> String.length l > 0)
|> String.concat "\n"
let csv = CsvFile.Parse(tableA)
for row in csv.Rows.Filter(fun row -> row?Close.AsFloat() > row?Open.AsFloat()) do
printfn "%s" row.GetColumn "Name"
I am trying to iterate through an unstrucutred csv file (it has no specific headings). The file is generated by an instrument. I would need to select specific rows that have specific column values and create another file. Below is the example of the file layout
,success, (row1)
1,2,protocol (row2)
78,f14,34(row3)
,67,34(row4)
,f14,34(row5)
3,f14,56,56(row6)
I need to select all rows with 'fi4' value. Below is the code
import csv
import sys
reader = csv.reader(open('c:/test_file.csv', newline=''), delimiter=',', quotechar='|')
for row in reader:
print(','.join(row))
I am unable to go beyond this point.
You're almost there:
for row in reader:
if row[1] == 'f14':
print(','.join(row))
You just need to check and see whether the row is one you're interested in or not by checking the value of the column and see if it's what you're looking for. That could be done with a simpleif row[1] == 'f14'conditional statement. However that would fail on any blank lines -- which it looks like your input file may have -- so you'd need to preface that check with another to make sure the row had at least that many columns in it.
To create another csv file with just those rows in it, all you'd need to write each row that passed all the checks to another file opened for output -- instead of, or in addition to, printing the row out. Here's a very concise way of just writing the rows to another file.
(Note: I'm not sure why you had thequotechar='|'in your code on thecsv.reader()call because there aren't any quote characters in the input file shown, so I left it out in the code below -- you might need to add it back if indeed that's what it would be if there were any.)
import csv
with open('test_file.csv', newline='') as infile, \
open('test_file_out.csv', 'w', newline='') as outfile:
csv.writer(outfile).writerows(row for row in csv.reader(infile)
if len(row) >= 2 and row[1] == 'f14')
Contents of'test_file_out.csv'file afterwards:
78,f14,34(row3)
,f14,34(row5)
3,f14,56,56(row6)
I'm using TwitteR package (specifically, the searchTwitter function) to export in a csv format all the tweets containing a specific hashtag.
I would like to analyze their text and discover how many of them contain a specific list of words that I have just saved in a file called importantwords.txt.
How can I create a function that could return me a score of how many tweets contain the words that I have written in my file importantwords.txt?
Pseudocode:
> for (every word in importantwords.txt):
> int i = 0;
> for (every line in tweets.csv):
> if (line contains(word)):
> i = i+1
> print(word: i)
Is that along the lines of what you wanted?
I think best bet is to use the tm package.
http://cran.r-project.org/web/packages/tm/index.html
This fella uses it to create Word Clouds with the information. Looking through his code will probably help you out too.
http://davetang.org/muse/2013/04/06/using-the-r_twitter-package/
If your important words is just to avoid "the" "a" and things like that this will work fine. If its for something in particular you'll need to loop over the corpus with your list of words retrieving the counts.
Hope it helps
Nathan