Is there a way to parse in python a .txt file with having the user give a word to parse by? - parsing

I am hoping to be able to parse a .txt file but after taking user input for the keyword(s) that will be searched.

Related

Is there any way to parse a btsnoop log capture file and extract data from it by filtering using python or any other way?

I wanted to parse a btsnoop log file and extract the l2cap data from it Is there any way to parse a btsnoop log capture file and extract data from it by filtering using python or any other way?. I am able to convert the .cfa file into a txt file which contains hex data. I am getting extra data with the raw data.I am able to convert the .cfa file into a txt file which contains hex data.
I am able to convert the .cfa file into a txt file which contains hex data.

How can I parse a .docx file containing images with Ruby?

I have a .docx file. It contains lots of images as well as text. I want to parse the data along with image and convert it into HTML. How can I do this? I tried docx gem, but it only reads the text part and skips the image.

How do I go about converting .docx into an array or dictionary?

I am currently trying to build an iOS game that match word with its definition for myself and my classmates.
I'm having a hard time thinking how do I go about converting a list of words with their definitions in a .docx files into something(JSON, XML, ...) that I can then read it into an Array or Dictionary.
Most of the words in the .docx have the following format:
" Word (): Definition. "
This would be easier with excel. Excel already has the function to export to xml, and should make your life a lot easier, instead of getting all the words out of docx and then converting them to either JSON or XML.
http://www.excel-easy.com/examples/xml.html

retrieve txt content of as many file types as possible

I maintain a client server DMS written in Delphi/Sql Server.
I would like to allow the users to search a string inside all the documents stored in the db. (files are stored as blob, they are stored as zipped files to save space).
My idea is to index them on "checkin", so as i store a nwe file I extract all the text information in it and put it in a new DB field. So somehow my files table will be:
ID_FILE integer
ZIPPED_FILE blob
TEXT_CONTENT text field (nvarchar in sql server)
I would like to support "indexing" of at least most common text-like files, such as:pdf, txt, rtf, doc, docx,pdf, may be adding xls and xlsx, ppt, pptx.
For MS Office files I can use ActiveX since I alerady do it in my application, for txt files i can simply read the file, but for pdf and odt?
Could you suggest the best techinque or even a 3rd party component (not free too) that parses with "no fear" all file types?
Thanks
searching documents this way would leed to a very slow and inconvenient to use, I'd advice you produce two additional tables instead of TEXT_CONTENT field.
When you parse the text, you should extract valuable words and try to standardise them so that you
- get rid of lower/upper case problems
- get rid of characters that might be used interchangeably.
i.e. in Turkish we have รง character that might be entered as c.
- get rid of verbs that are common in the language you are dealing with.
i.e. "Thing I am looking for", "Thing" "Looking" might be in your interest
- get rid of whatever problem use face.
Each word, that has already an entry in the table should re-use the ID already given in the string_search table.
the records may look like this.
original_file_table
zip_id number
zip_file blob
string_search
str_id number
standardized_word text (or any string type with an appropriate secondary index)
file_string_reference
zip_id number
str_id number
I hope that I could give you the idea what I am thinking of.
Your major problem is zipping your files before putting them as a blob in your database which makes them unsearchable by the database itself. I would suggest the following.
Don't zip files you put in the database. Disk space is cheap.
You can write a query like this as long as you save the files in a text field.
Select * from MyFileTable Where MyFileData like '%Thing I am looking for%'
This is slow but it will work. This will work because the text in most of those file types is in plain text not binary (though some of the newer file types are now binary)
The other alternative is to use an indexing engine such as Apache Lucene or Apache Solr which will as you put it
parses with "no fear" all file types?

iOS: Read in XLS

I'm trying to figure out how to read in the contents of an XLS document and I'm able to get the bytes just fine, but I don't have any clue where to go from here. Trying [[NSString alloc] initWithBytes:data.bytes length:data.length encoding:NSUTF8StringEncoding] and [NSString stringWithUTF8String:data.bytes] both don't get me anywhere (null). What are you supposed to do to read in the contents of an XLS file?
Trying to combine two answer.
"There is no innate ability to read Excel data into a Foundation container, like an NSArray or NSDictionary. You could, however, convert the file (with Excel) to a comma-separated-value (CSV) file and then parse each line's cells on the iPhone using the NSString instance method -componentsSeparatedByString:."
"A comma-separated values (CSV) file stores tabular data (numbers and text) in plain-text form. Plain text means that the file is a sequence of characters, with no data that has to be interpreted instead, as binary numbers. A CSV file consists of any number of records, separated by line breaks of some kind; each record consists of fields, separated by some other character or string, most commonly a literal TAB or comma. Usually, all records have an identical sequence of fields"
--
How to read cell data from an Excel document with objective-c
objective-c loading data from excel
Even though saving your Excel file to CSV is the easier answer, sometimes that's not really what you're looking for, so I created QZXLSReader. It's a drag-and-drop solution so it's a lot easier to use. I don't think it's as feature complete, but it worked for me.
It's basically a library that can open XLS files and parse them into Obj-C classes. Once you have the classes, it's very easy to send them to Core Data or a dictionary or what have you.
I hope it helps!

Resources