Build CSV from parse files python - parsing

I am building a small database (for personal use), from over a 1000 files. I am looking for specific word, but the issue that I have if the word is not contained in the file how can i write a NoData line, what I would like to have is:
Africa Botswana test 51.1922546 -113.9366341
Africa Kenya Skydive Kenya -13.788388 33.78498
Africa Malawi Skydive Malawi NoData NoData
Africa Mauritius SkyDive Austral 30.5000854 -8.824510574
Africa Morocco Beni Mellal NoData NoData
for i in os.listdir(Main_Path):
if "-" in i:
for filename in os.listdir(Main_Path+i):
if ".dat" in filename and os.path.isdir(Main_Path+i):
f_split = filename.split("-")
if len(f_split) == 4:
continent.append(f_split[0])
country.append(f_split[1])
state.append(f_split[2].split(".")[0])
else:
continent.append(f_split[0])
country.append("")
state.append(f_split[1].split(".")[0])
d = open(Main_Path+i+"/" + filename, "r")
files = d.readlines()
d.close()
for k, line in enumerate(files):
if "Dropzone.com :" in line:
dzname.append(line.split(":")[1].strip())
elif 'id="lat"' in line:
lat.append(line.split("=")[3].split('"')[1].strip())
myFile = open(Main_Path+"MYFILE.csv", "wb")
wtr= csv.writer( myFile )
for a,b,c,d,e in zip(continent,country,state,dzname,lat):
wtr.writerow([a,b,c,d,e])
myFile.close()
I am stack "elif 'id="lat"' in line:" because it adds to the list "lat" only the files which contains id = lat. I do understand why but I would like the parser to return and add to the list an NoData
sorry i wrote the question from another comp.

Do you mean something like this?
That is: if no line in files contains id="lat" it will append "No Data" to lat.
snip...
d = open(Main_Path+i+"/" + filename, "r")
files = d.readlines()
d.close()
found_latitude = False
for k, line in enumerate(files):
if "Dropzone.com :" in line:
dzname.append(line.split(":")[1].strip())
elif 'id="lat"' in line:
found_latitude = True
lat.append(line.split("=")[3].split('"')[1].strip())
if not found_latitude:
lat.append("No Data")
snip...

Related

Removing the file paths and using the file number to perform some calculations while plotting

I am trying to read .txt files from a directory which have the following order;
x-23.txt
x-43.txt
x-83.txt
:
:
x-243.txt
I am calling these files using filename = system("ls ../Data/*.txt"). The goal is to load these files and plot certain columns. At the same time, I am trying to parse the file names such that it would look like as below so that I can use them as title in the plot and add/subtract them from a certain column;
23
43
83
:
:
243
For that, I tried the following;
dirname = '../Data/'
str = system('echo "'.dirname. '" | perl -pe ''s/x[\d-](\d+).txt/\1.\2/'' ')
cv = word(str, 1)
The above lines doesn't seem to trim and produce numbers on the files. The code all together;
filelist1 = system("ls ../Data/*.txt")
print filelist1
dirname = '../Data/'
str = system('echo "'.dirname. '" | perl -pe ''s/x[\d-](\d+).txt/\1.\2/'' ')
cv = word(str, 1)
plot for [filename1 in filelist1] filename1 using (-cv/1000+ Tx($4)):(X($3)) with points pt 7 lc 6 title system('basename '.filename1),\
I am trying to use the file numbers "cv" after parsing the .txt files to subtract them from column Tx($4) while plotting.
directory = "../temp/"
filelist = system("cd ../temp/ ; ls *.txt")
files = words(filelist)
filename(i) = directory . word(filelist,i)
title(i) = word(filelist,i)[3 : strstrt(word(filelist,i),'.')-1]
plot for [i=1:files] filename(i) using ... title title(i)
Test case (edited to show pulling files from another directory):
gnuplot> print filelist
x-234.txt
x-23.txt
x-2.txt
x-34.txt
gnuplot> do for [i=1:files] { print i, ": ", filename(i) }
1: ../temp/x-234.txt
2: ../temp/x-23.txt
3: ../temp/x-2.txt
4: ../temp/x-34.txt
gnuplot> plot for [i=1:files] x*i title title(i)

Using a single pattern to capture multiple values containing in a file in lua script

i have a text file that contains data in the format YEAR, CITY, COUNTRY. data is written as one YEAR, CITY, COUNTRY per line. eg -:
1896, Athens, Greece
1900, Paris, France
Previously i was using the data hard coded like this
local data = {}
data[1] = { year = 1896, city = "Athens", country = "Greece" }
data[2] = { year = 1900, city = "Paris", country = "France" }
data[3] = { year = 1904, city = "St Louis", country = "USA" }
data[4] = { year = 1908, city = "London", country = "UK" }
data[5] = { year = 1912, city = "Stockholm", country = "Sweden" }
data[6] = { year = 1920, city = "Antwerp", country = "Netherlands" }
Now i need to read the lines from the file and get the values in to the private knowledge base "local data = {} "
Cant figure out how to capture multiple values using a single pattern from the data in the file.
My code so far is
local path = system.pathForFile( "olympicData.txt", system.ResourceDirectory )
-- Open the file handle
local file, errorString = io.open( path, "r" )
if not file then
-- Error occurred; output the cause
print( "File error: " .. errorString )
else
-- Read each line of the file
for line in file:lines() do
local i, value = line:match("%d")
table.insert(data, i)
-- Close the file
io.close(file)
end
file = nil
Given that you read a line like
1896, Athens, Greece
You can simply obtain the desired values using captures.
https://www.lua.org/manual/5.3/manual.html#6.4.1
Captures: A pattern can contain sub-patterns enclosed in parentheses; they describe captures. When a match succeeds, the
substrings of the subject string that match captures are stored
(captured) for future use. Captures are numbered according to their
left parentheses. For instance, in the pattern "(a*(.)%w(%s*))", the
part of the string matching "a*(.)%w(%s*)" is stored as the first
capture (and therefore has number 1); the character matching "." is
captured with number 2, and the part matching "%s*" has number 3.
As a special case, the empty capture () captures the current string
position (a number). For instance, if we apply the pattern "()aa()" on
the string "flaaap", there will be two captures: 3 and 5.
local example = "1896, Athens, Greece"
local year, city, country = example:match("(%d+), (%w+), (%w+)")
print(year, city, country)

Text file reading with dict in python

From bellow text file, read the text file into a python program and group all the words according to their first letter. Represent the groups in form of dictionary. Where the staring alphabet is the "key" and all the words starting with the alphabets are list of "values".
Text file is:
Among other public buildings in a certain town, which for many reason it will be prudent to
refine from mentioning, and to which i will assign no fictitious name, there is one anciently
common to most towns, great or small.
stream = open('file name', 'r')
str = ''
current = ' '
while current != '':
current = stream.read(50)
str += current
words = str.split(' ')
dict = {}
for w in words:
if not w[0] in dict:
dict[w[0]] = [w]
else:
dict[w[0]].append(w)
The dictionary is dict

Using AKSampleDescriptor

Using AKSamplerDescriptor
I am using an adapted AKSampler example, in which I try to use the sforzando output of Fluid.sf3 melodicSounds. Sforzando creates .sfz files for each instrument, but all pointing for the global sample to a huge .wav file.
In all the instrument.sfz files there is an offset and endpoint description for the part of the wave file to be used.
When I load the .sfz file I get a crash due to memory problems. It seems that for every defined region in the .sfz file the complete .wav file (140 mB) is loaded again.
The most likely is that loading the sample file with the AKSampleDescriptor as done in the AKSampler example will ignore offset and endpoint (AKSampleDescriptor.startPoint and AKSampleDescriptor.endPoint) while reloading the complete .wav file.
Is there a way to load just the part start-to-end wanted from the sample file, because the complete file has al the sample data for all the instruments (I know and use polyphony that extracts only one instrument at the time and works fine, but this is for other use)
Or, and that seems the best to me, just load the file once and than have the sampledescriptors point to the data in memory
Good suggestions, Rob. I just ran into this one-giant-WAV issue myself, having never seen it before. I was also using Sforzando for conversion. I'll look into adding the necessary capabilities to AKSampler. In the meantime, it might be easier to write a program to cut up the one WAV file into smaller pieces and adjust the SFZ accordingly.
Here is some Python 2.7 code to do this, which I have used successfully with a Sforzando-converted sf2 soundfont. It might need changes to work for you--there is huge variability among sfz files--but at least it might help you get started. This code requires the PyDub library for manipulating WAV audio.
import os
import re
from pydub import AudioSegment
def stripComments(text):
def replacer(match):
s = match.group(0)
if s.startswith('/'):
return " " # note: a space and not an empty string
else:
return s
pattern = re.compile(
r'//.*?$|/\*.*?\*/|\'(?:\\.|[^\\\'])*\'|"(?:\\.|[^\\"])*"',
re.DOTALL | re.MULTILINE
)
return re.sub(pattern, replacer, text)
def updateSplitList(splitList, regionLabels, values):
if len(values) > 3:
start = int(values['offset'])
length = int(values['end']) - start
name = regionLabels.pop(0)
splitList.add((name, start, length))
def lookupSplitName(splitList, offset, end):
for (name, start, end) in splitList:
if offset == start and end == end:
return name
return None
def outputGroupAndRegion(outputFile, splitList, values):
if values.has_key('lokey') and values.has_key('hikey') and values.has_key('pitch_keycenter'):
outputFile.write('<group> lokey=%s hikey=%s pitch_keycenter=%s\n' % (values['lokey'], values['hikey'], values['pitch_keycenter']))
elif values.has_key('key') and values.has_key('pitch_keycenter'):
outputFile.write('<group> key=%s pitch_keycenter=%s\n' % (values['key'], values['pitch_keycenter']))
if len(values) > 3:
outputFile.write(' <region> ')
if values.has_key('lovel') and values.has_key('hivel'):
outputFile.write('lovel=%s hivel=%s ' % (values['lovel'], values['hivel']))
if values.has_key('tune'):
outputFile.write('tune=%s ' % values['tune'])
if values.has_key('volume'):
outputFile.write('volume=%s ' % values['volume'])
if values.has_key('offset'):
outputFile.write('offset=0 ')
if values.has_key('end'):
outputFile.write('end=%d ' % (int(values['end']) - int(values['offset'])))
if values.has_key('loop_mode'):
outputFile.write('loop_mode=%s ' % values['loop_mode'])
if values.has_key('loop_start'):
outputFile.write('loop_start=%d ' % (int(values['loop_start']) - int(values['offset'])))
if values.has_key('loop_end'):
outputFile.write('loop_end=%d ' % (int(values['loop_end']) - int(values['offset'])))
outputFile.write('sample=samples/%s' % lookupSplitName(splitList, int(values['offset']), int(values['end'])) + '.wav\n')
def process(inputFile, outputFile):
# create a list of region labels
regionLabels = list()
for line in open(inputFile):
if line.strip().startswith('region_label'):
regionLabels.append(line.strip().split('=')[1])
# read entire input SFZ file
sfz = open(inputFile).read()
# strip comments and create a mixed list of <header> tags and key=value pairs
sfz_list = stripComments(sfz).split()
inSection = "none"
default_path = ""
global_sample = None
values = dict()
splitList = set()
# parse the input SFZ data and build up splitList
for item in sfz_list:
if item.startswith('<'):
inSection = item
updateSplitList(splitList, regionLabels, values)
values.clear()
continue
elif item.find('=') < 0:
#print 'unknown:', item
continue
key, value = item.split('=')
if inSection == '<control>' and key == 'default_path':
default_path = value.replace('\\', '/')
elif inSection == '<global>' and key == 'sample':
global_sample = value.replace('\\', '/')
elif inSection == '<region>':
values[key] = value
# split the wav file
bigWav = AudioSegment.from_wav(global_sample)
#print "%d channels, %d bytes/sample, %d frames/sec" % (bigWav.channels, bigWav.sample_width, bigWav.frame_rate)
frate = float(bigWav.frame_rate)
for (name, start, length) in splitList:
startMs = 1000 * start / frate
endMs = 1000 * (start + length) / frate
wav = bigWav[startMs : endMs]
wavName = 'samples/' + name + '.wav'
wav.export(wavName, format='wav')
# parse the input SFZ data again and generate the output SFZ
for item in sfz_list:
if item.startswith('<'):
inSection = item
outputGroupAndRegion(outputFile, splitList, values)
values.clear()
continue
elif item.find('=') < 0:
#print 'unknown:', item
continue
key, value = item.split('=')
if inSection == '<control>' and key == 'default_path':
default_path = value.replace('\\', '/')
elif inSection == '<global>' and key == 'sample':
global_sample = value.replace('\\', '/')
elif inSection == '<region>':
values[key] = value
dirPath = '000'
fileNameList = os.listdir(dirPath)
for fileName in fileNameList:
if fileName.endswith('.sfz'):
inputFile = os.path.join(dirPath, fileName)
outputFile = open(fileName, 'w')
print fileName
process(inputFile, outputFile)

Display exact Requirement IDs using dxl in Doors

i am parsing a requirement Module in Doors and i want to get the outlinked requirements so i did that :
Stream outfile= write("D:\\Users\\iiii\\" reportName ".txt")
outfile << "Spec Report Requirement IDs\n-----------------------------\n"
Object o
Module m = read(planSpecReportPath_inDoors)
Link outLink
ModName_ parentModName
for o in m do
{
for outLink in o -> "*" do
{
parentModName = target(outLink)
string h = fullName(parentModName) "\n\n"
outfile << h
}
}
however i ONLY get the linked requirement documents paths and can't get exact Req ID .
My question is if i want to get all outlinks to specific Requirement Module with Requirement IDs not just Requirement Document path , what shall i do , any help ?
you will need the perm
int targetAbsNo (Link)
So, in your example something like
parentModName = target(outLink)
int iTarget = targetAbsNo(outLink)
string h = fullName(parentModName) " (" iTarget ")" "\n\n"

Resources