How to make text file (or other documents') parser? [closed] - parsing

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have following task to do: to fill spell check dictionary (simple txt file) I need parser
which should: - parse within text file (or another type of document), extract
each word and then create text file with simple list of words like this:
adfadf
adfasdfa
adfasfdasdf
adsfadf
...
etc
What scripting language and library you would suggest? If possible, please, give example of code (especially for extracting each word). Thanks!

What you want is not a parser, but just a tokenizer. This can be done in any language with a bunch of regular expressions, but I do recommend Python with NLTK:
>>> from nltk.tokenize import word_tokenize
>>> word_tokenize('Hello, world!')
['Hello', ',', 'world', '!']
Generally, just about any NLP toolkit will include a tokenizer, so there's no need to reinvent the wheel; tokenizing isn't hard, but it involves writing a lot of heuristics to handle all the exceptions such as abbreviations, acronyms, etc.

Related

How to insert data into my app efficiently? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 9 years ago.
I hv a lot of data that format like this:
『 No.1
introduction:
explanation:
Parts1:
Parts2:
..
..
parts8.
No.2
introduction:
explanation:
....
....
....
....
No.100
...
』
I am setting up of my app's model that hold the data as NSMutableDictionary.
So that i can find the data by input a key.
The problem is that there is a lot of data(over 500 sets), Can i have a efficient ways to insert the data without "boring typing"?????
please help.
thank You!^_^"
Create either a JSON file or a property list file and use the built-in JSON or property list parsing facilities to read the file. Much better than building your own parser.
They way I would approach this problem is to open your file in a text pad and change all the : to pipes |. Now you have a pipe delimited file that you can use to parse.
No.1 introduction| explanation| Parts1| Parts2| parts3| (this is on line1)
No.2 introduction| explanation| Parts1| Parts2| parts3| (this is on line2)
Now put this file into a string and go over line by line to parse the string. Get each of the values put them in an array and then save the array in your NSDictonary with a key value. I will try to search for some sample code. From here you know
array(0) is - No.1 introduction
array(1) is - explanation
...
Check this post NSString tokenize in Objective-C

Sort an array/table of words from shortest to longest [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
Corona/Lua how to sort a table of strings from shortest to longest
Assuming your table is a indexed table and not a keyed one try
test = {'123','1234','1245','1','12'}
table.sort(test, function(a,b) return #a<#b end)
for i,v in ipairs(test) do
print (i,v)
end
The important line here is
table.sort(test, function(a,b) return #a<#b end)
Words will only sorted by length and order within matching lengths will be arbitrary. If you want to sort by additional criteria, extend the function for the sort
eg function(a,b) return #a<#b end

How to implement query searching in a specific cluster after document clustering? [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I have two clusters as a class which has
Cluster : class
DocumentList : List<Document>
centroidVector : Map<String,Double>
Now the problem is that when the query is searched it is parsed as a file and then made into a document object , added to documentIndex and its index is constructed along with other documents . I did that because it had to go through the same procedure i.e tokenizing ,stemming etc. But now i want to implement query search in a specific cluster with which the query vector is most similar with , i.e dot product ~ 0.5 -1 . So i would have to take a dot product between the query vector and the cluster vector to do that. But i dont know how to implement it because the index is created in memory and is not stored in the database. Still in the process of doing that .
Thank you
Clustering is not meant for searching (i.e. indexing etc.). It is an analysis step meant to find possible unknown structure within your data set, not to retrieve information faster.
You can exploit the structure sometimes for faster search, but then you need an index that can make use of this.
Just do an index right away if you want to do similarity search! Then try to improve the index by doing some clustering before.

Haskell: *** Exception: Prelude.read: no parse, Parsing, Read File [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 10 years ago.
I am trying to read values from a .txt file and keep getting the above error. My code reads:
file <- readFile "films.txt"
let database = (read file :: [Film])
and Film is a data type I declared as:
type Film = (String, String, Int)
Still quite new to Haskell so have no idea how to parse a string back into the required type. Sort of assumed it would be nice and do it for me like it does with writeFile!
Any hints?

saving strings that are 'connected' and reading them and their 'connected' [closed]

It's difficult to tell what is being asked here. This question is ambiguous, vague, incomplete, overly broad, or rhetorical and cannot be reasonably answered in its current form. For help clarifying this question so that it can be reopened, visit the help center.
Closed 12 years ago.
This is something I am not at all familiar with.
I want to try to make a simple form with 4 edit boxes, 2 at the top, 2 at the bottom, and a button. Basically what I want to do is type a couple of things in the top two boxes that are related to each other.
When I have them both filled in I click on the button and it saves this information in a database, preferable an external file (doesn't have to be text, I think it would be better if not). So I can do that a couple of times. Saving from the edit fields into a database.
Then when I type one of the words saved in one of the edit fields at the bottom it automatically types the other word in the last edit field. The form should remember to connect to the database every time it's opened so that when I open it another time I can still work the edit fields.
Can anyone advise me on how to do this?
What you are looking for is known as a dictionary, if I understand you correctly. In other languages it is known as an associative array or sometimes a hash.
You are going to want a modern version of Delphi, I'd guess 2010 or XE. If you can't access those then you'd need a 3rd party library, or a home grown based off a TStringList. In fact TStringList can operate in a dictionary like mode but it's a bit clunky.
You declare the dictionary as follows:
dict: TDictionary<string,string>;
You can add do it as follows:
dict.Add(box1.Text, box2.Text);
The first parameter is the key. The second is the value. Think of this as an array but indexed with a string rather than an integer.
If you want to recover a value then you use:
dict[key];
In your case you would write:
box4.Text := dict[box3.Text];
If you want to save to a file then you would iterate over the dict:
var
item: TPair<string,string>;
...
for item in dict do
AddToTextFile(item.Key, item.Value);
I've ignored all error handling issues, dealing with adding keys that already exist, asking for keys that are not in the dict, and so on. But this should give you a flavour.
I'd recommend reading up on associative arrays if you aren't already familiar with them. I'm sure there will be a page on Wikipedia and you would do worse than read an tutorial on Python which is sure to cover them – the issues are really the same no matter what language you consider.

Resources