How to create a program that monitors a file (for example, text, and when you wrote something new to this file, the program should output that something was added to the file, and when, on the contrary, part of the text was deleted from it, it should write that something was deleted
And it should print to the console exactly which words were deleted or added?
Explanation
I use watchdog to follow the file.
On instantiation of the handler, I read the file's size.
When the file is modified, watchdog calls the on_modified function.
When this method is called, I compare the file's current size to its previous size to determine if the change was additive or subtractive.
You have a few other options when it comes to tracking the file. For example, you could also compare:
the number of lines
the number of words
the number of characters
the exact contents of the file
import os
import time
from watchdog.observers import Observer
from watchdog.events import FileSystemEventHandler
class EventHandler(FileSystemEventHandler):
def __init__(self, file_path_to_watch):
self.file_path_to_watch = file_path_to_watch
self._file_size = self._read_file_size()
def _read_file_size(self):
return os.path.getsize(self.file_path_to_watch)
def _print_change(self, new_file_size):
if new_file_size > self._file_size:
print('File modified with additions')
elif new_file_size < self._file_size:
print('File modified with deletions')
def on_modified(self, event):
if event.src_path != self.file_path_to_watch:
return
new_file_size = self._read_file_size()
self._print_change(new_file_size)
self._file_size = new_file_size
if __name__ == "__main__":
file_to_watch = '/path/to/watch.txt'
event_handler = EventHandler(file_to_watch)
observer = Observer()
observer.schedule(event_handler, path=file_to_watch, recursive=False)
observer.start()
try:
while True:
time.sleep(1)
except KeyboardInterrupt:
observer.stop()
observer.join()
Related
Using Biopython. I have a list of atoms. rep_atoms = [CA, CB, CD3] (Carbon atoms).
I want to save only these from any given PDB file. I don't want to save it locally; I want it to save in the memory (Lots of iteration).
I have arrived at the code below, but it saves the file locally and is very slow.
So, my goal is from each atom in PDB, if it is present in rep_atoms. Make a new_pdb store only that information so that when I call it later in my code, it should be a PDB file without getting saved in my computer in a local folder.
How do I append each atom? Printing all atoms is very fast. I want to append it, but it wouldn't be a PDB structure file. What should I do?
from Bio.PDB import .... PDBIO, Select ....
class rep_atom_Select(Select):
def accept_atom(self, atom):
if atom.get_name() in rep_atoms:
return 1
else:
return 0
def rep_atoms_pdb(input_pdb):
io = PDBIO()
io.set_structure(input_pdb)
for model in input_pdb:
for chain in model:
for residue in chain:
for atom in residue:
if atom.get_name() in rep_atoms:
print(atom)
# dnr_only = io.save("dnr_only.pdb", rep_atom_Select())
Save after the loop, once, instead of thousands of times inside the loop.
def rep_atoms_pdb(input_pdb):
my_atoms = list()
for model in input_pdb:
for chain in model:
for residue in chain:
for atom in residue:
if atom.get_name() in rep_atoms: # or if rep_atom_Select().accept_atom(atom):
my_atoms.append(atom) # or something like this
# The function returns the list of extracted atoms
return my_atoms
Your definition of rep_atom_Select() does not seem to be directly compatible with this design, nor am I sure receiving the atoms as a list is actually what you want, but this should at least give you a nudge in the right direction.
Brief reading of the Bio.PDB.PDBIO documentation suggests that you might simply want to return the actual PDBIO object. I think something like this:
class rep_atom_Select(Select):
def accept_atom(self, atom):
if atom.get_name() in rep_atoms:
return 1
else:
return 0
def rep_atoms_pdb(input_pdb):
io = rep_atom_Select()
io.set_structure(input_pdb)
return io
This is based on a very cursory reading of the documentation, but at least demonstrates how you would use your overridden class to select only some of the atoms in the input_pdb structure.
I am trying to bulk-download the text visible to the "end-user" from 10-K SEC Edgar reports (don't care about tables) and save it in a text file. I have found the code below on Youtube, however I am facing 2 challenges:
I am not sure if I am capturing all text, and when I print the URL from below, I receive very weird output (special characters e.g., at the very end of the print-out)
I can't seem to save the text in txt files, not sure if this is due to encoding (I am entirely new to programming).
import re
import requests
import unicodedata
from bs4 import BeautifulSoup
def restore_windows_1252_characters(restore_string):
def to_windows_1252(match):
try:
return bytes([ord(match.group(0))]).decode('windows-1252')
except UnicodeDecodeError:
# No character at the corresponding code point: remove it.
return ''
return re.sub(r'[\u0080-\u0099]', to_windows_1252, restore_string)
# define the url to specific html_text file
new_html_text = r"https://www.sec.gov/Archives/edgar/data/796343/0000796343-14-000004.txt"
# grab the response
response = requests.get(new_html_text)
page_soup = BeautifulSoup(response.content,'html5lib')
page_text = page_soup.html.body.get_text(' ',strip = True)
# normalize the text, remove characters. Additionally, restore missing window characters.
page_text_norm = restore_windows_1252_characters(unicodedata.normalize('NFKD', page_text))
# print: this works however gives me weird special characters in the print (e.g., at the very end)
print(page_text_norm)
# save to file: this only gives me an empty text file
with open('testfile.txt','w') as file:
file.write(page_text_norm)
Try this. If you take the data you expect as an example, it will be easier for people to understand your needs.
from simplified_scrapy import SimplifiedDoc,req,utils
url = 'https://www.sec.gov/Archives/edgar/data/796343/0000796343-14-000004.txt'
html = req.get(url)
doc = SimplifiedDoc(html)
# text = doc.body.text
text = doc.body.unescape() # Converting HTML entities
utils.saveFile("testfile.txt",text)
So I am trying to create a Kivy app that allows a user to control and monitor various hardware components. Part of the code builds and continuously updates an Excel worksheet that imports temperature readings from the hardware's comm port, along with a time-stamp. I have been able to implement all of this so far, but I am unable to interact with the Kivy app while the Excel worksheet is being built/updated (i.e. while my hardware test is underway), and leaves me unable to use the app's features while the test is running (Such as the 'Pause' or 'Abort' buttons) until the worksheet is no longer being altered. So my question is: Is it possible to export to an Excel file while being able to simultaneously use the Kivy app? And if so, how?
This is part of my code that sets up the Excel worksheet. Thank you in advance!
from kivy.app import App
from openpyxl import Workbook, load_workbook
import time
class HomeScreen(Screen):
def build(self):
return HomeScreen()
def RunExcelFile(self):
wb = Workbook()
ws = wb.active
a = 0
i = 2
while (a < 5):
ws.cell('A1').value = 'Time'
ws.cell('B1').value = 'Batch 1'
ws.cell('C1').value = 'Batch 2'
column = 'A'
row = i
time_cell = column + str(row)
t = time.localtime()
ws.cell(time_cell).value = time.asctime(t)
a = (a + 1)
i = (i + 1)
time.sleep(1)
wb.save("scatter.xlsx")
If you are doing some background job without touching widgets or properties, you can use threading module without problems. Otherwise, you would need to use #mainthread decorator or Clock.
import time
import threading
class HomeScreen(Screen):
def run_excel_file(self):
def job():
for i in xrange(5):
print i
time.sleep(1)
print 'job done'
threading.Thread(target=job).start()
I would like to be able to join the two "dictionaries" stored in "indata" and "pairdata", but this code,
indata = SeqIO.index(infile, infmt)
pairdata = SeqIO.index(pairfile, infmt)
indata.update(pairdata)
produces the following error:
indata.update(pairdata)
TypeError: update() takes exactly 1 argument (2 given)
I have tried using,
indata = SeqIO.to_dict(SeqIO.parse(infile, infmt))
pairdata = SeqIO.to_dict(SeqIO.parse(pairfile, infmt))
indata.update(pairdata)
which does work, but the resulting dictionaries take up too much memory to be practical for for the sizes of infile and pairfile I have.
The final option I have explored is:
indata = SeqIO.index_db(indexfile, [infile, pairfile], infmt)
which works perfectly, but is very slow. Does anyone know how/whether I can successfully join the two indexes from the first example above?
SeqIO.index returns a read-only dictionary-like object, so update will not work on it (apologies for the confusing error message; I just checked in a fix for that to the main Biopython repository).
The best approach is to either use index_db, which will be slower but
only needs to index the file once, or to define a higher level object
which acts like a dictionary over your multiple files. Here is a
simple example:
from Bio import SeqIO
class MultiIndexDict:
def __init__(self, *indexes):
self._indexes = indexes
def __getitem__(self, key):
for idx in self._indexes:
try:
return idx[key]
except KeyError:
pass
raise KeyError("{0} not found".format(key))
indata = SeqIO.index("f001", "fasta")
pairdata = SeqIO.index("f002", "fasta")
combo = MultiIndexDict(indata, pairdata)
print combo['gi|3318709|pdb|1A91|'].description
print combo['gi|1348917|gb|G26685|G26685'].description
print combo["key_failure"]
In you don't plan to use the index again and memory isn't a limitation (which both appear to be true in your case), you can tell Bio.SeqIO.index_db(...) to use an in memory SQLite3 index with the special index name ":memory:" like so:
indata = SeqIO.index_db(":memory:", [infile, pairfile], infmt)
where infile and pairfile are filenames, and infmt is their format type as defined in Bio.SeqIO (e.g. "fasta").
This is actually a general trick with Python's SQLite3 library. For a small set of files this should be much faster than building the SQLite index on disk.
I have been trying to find a solution to this for some time. I have found questions and answers on recursion but nothing that seemed to fit this particular situation.
I have written a class which should go through the given folder and all subfolders and rename files and folders if a particular search pattern is found.
Everything works as expected the replaceAllInDir gets called, it replaces files and folders if needed. The next step then is to do the same for all subfolders within the given folder.
So a subfolder gets identified and replaceAllInDir gets called from within itself. Let's assum the particular subfolder called does not contain any subfolders. I would then expect that we return to the parent folder and continue looking for other subfolders. But instead control is not returned to the parent calling method and the program ends.
I am aware of other ways of solving the actual use case, but I cannot explain the behaviour of ruby.
class MultiFileAndFolderRename
attr_accessor :rootDir, :searchPattern, :replacePattern
def initialize(rootDir, searchPattern, replacePattern)
#rootDir = rootDir
#searchPattern = searchPattern
#replacePattern = replacePattern
end
def execute
replaceAllInDir(#rootDir)
end
def getValidDirEntries(dir)
dirList = Dir.entries(dir)
dirList.delete('.')
dirList.delete('..')
dirList
end
def replaceAllInDir(currentDir)
Dir.chdir(currentDir)
puts "Processing directory: " + Dir.pwd
dirList = getValidDirEntries(currentDir)
dirList.each { |dirEntry|
attemptRename(dirEntry)
}
dirList = getValidDirEntries(currentDir)
dirList.each { |dirEntry|
if File.directory?(dirEntry)
newDir = currentDir + '\\' + dirEntry
rntemp = MultiFileAndFolderRename.new(newDir, 'searchString', 'replaceString')
rntemp.replaceAllInDir(newDir)
end
}
end
def attemptRename(dirEntry)
if dirEntry.match(#searchPattern)
newname = dirEntry.to_s.sub(#searchPattern, #replacePattern)
FileUtils.mv(dirEntry.to_s, newname)
end
end
end
You have a bug. The first line of replaceAllInDir() is Dir.chdir(). chdir() changes the directory of the current process on a global scale. It's not call-stack dependent. So later when you move into a subdirectory and change into that, the change becomes permanent even if you return from the recursion.
You need to change back to the correct directory after any call to replaceAllInDir(). For example:
...
dirList.each { |dirEntry|
if File.directory?(dirEntry)
....
rntemp.replaceAllInDir(newDir)
Dir.chdir(currentDir) # <- Restore us back to the correct directory
end
}
I have tried your code, and I have found numerous errors in it. Perhaps if you fix them, your idea is working.
You should include in a library like that a part at the end that allows to call it from the shell: MultiFileAndFolderRename.new(ARGV[0], ARGV[1], ARGV[2]).execute if __FILE__ == $0 This ensures when you call the ruby code from the shell by ruby rename.rb test old new, your class will be instantiated, and the search and replace pattern will be set accordingly.
You shouldn't set the current directory, because that ensures that the line getValidDirEntries(currentDir) will not work. If you eg. call it for the directory test, and then change your current directory to test, inside the directory, getValidDirEntries('test') will not work like expected.
You should use only forward slashes instead of the double backward ones. So your code will work on Linux and Mac OS X as well.
When you instantiate the new instance of MultiFileAndFolderRename (which is not necessary), the arguments to the initializer are the wrong ones. Instead, you should use your current instance and just call self.replaceAllInDir(newDir) instead of rntemp = MultiFileAndFolderRename.new(newDir, 'searchString', 'replaceString');rntemp.replaceAllInDir(newDir).
I think the wrong instantiation is the major reason why it works not as expected, but the others should be fixed as well.