How to list the current set of words in Forth - forth

Is it possible to get the list of all words which are currently defined in Forth (for example in Gforth)?

Standard WORDS word prints all the words from the top vocabulary (wordlist) only — see the specification .
TRAVERSE-WORDLIST proposal defines API to enumerate words in given wordlist. Some Forth-systems already implemented this proposal ([update] it is already in the draft).
Although there is no standard API to enumerate all defined wordlists. GET-ORDER gives a list of the context wordlists only (i.e. wordlists in the scope).
Some Forth-systems have VOCS word that prints all defined vocabularies.
Other possible APIs are specific for particular Forth-system. For example, SP-Forth has ENUM-VOCS word to enumerate all known wordlists.

Related

What does it mean for a fact used in Isabelle to have a number after the name?

Most of the proofs suggested by Sledgehammer use this notation of number inside of parentheses:
by (smt (z3) ApplyAllResult.distinct(1)
ApplyResult.case(1)
ApplyResult.case(2)
ApplyResult.exhaust
applyInput.simps(1))
What does it mean for the fact to have such number?
Isabelle permits the use of fact lists indexed by a natural number starting with 1. Given a fact list fs and an index i, you can access an individual fact from the list by using the syntax fs(i). You can also select multiple facts from the list using multiple indexes (e.g., fs(1,3)), ranges (e.g. fs(2-5), fs(3-)) or a combination of both (e.g., fs(2,4-6)).
Examples of predefined fact lists are assms (which contains the assumptions of a theorem) and f.simps (which contains the equations defining function f).

Is there a difference between Forth vocabularies and word lists?

As I read "Programming Forth" by Stephen Pelc, the text seems to imply that vocabularies and word lists may be separate things. I have thought that dictionary vocabulary entries have a name field, code field, etc. so having separate word lists does not make sense to me.
Are word lists just a way of talking about the name fields of Forth words or are word lists actual data structures separate from dictionary entries? (Forth language resources are a bit scarce compared to mainstream languages)
Vocabularies and wordlists are basically the same. I can think of two differences:
Wordlists are specified in the Forth standard. Vocabularies are not, but there's nearly a consensus on how they are used.
A vocabulary has a name. A wordlist just has a numeric id.
A third similar concept is a "lexicon". I haven't seen it used as often as the other two, but I think it's yet another variation or a synonym.
A wordlist is a collection of dictionary entries. It could be a linked list, a hash table, or anything else that works for looking up named entries. The dictionary may be partitioned into several wordlists, i.e. a wordlist is a subset of the dictionary.
A wordlist is the ISO 94 Forth standard name of what is generally called a namespace. wikipedia does a good job in explaining this concept. In Forth multiple namespaces can be coupled to the interpreter which means that words in it can be used, and coupling a namespaces is called "adding the wordlist to the search order". Forth is extensible and has the pretty unique feature that new additions may be added to any wordlist, even one that is not in the search order. That namespace is called CURRENT and the actual, momentaneous search order, consisting of one or more namespaces, is called the CONTEXT. Both can be changed anywhere in an interactive session and anywhere in a compilation.
At the time of creation of the standard there was no consensus of how to give a name to a wordlist, so there is only the word WORDLIST that creates an anonymous namespace and leaves its handle on the stack. The word VOCABULARY was widely in use to create a named namespace, but because there was no consensus about it, VOCABULARY is not part of the standard and one cannot use it in portable programs (which is quite cumbersome, so everybody does it anyway and hopes for the best.)
So to summarize. wordlist as a concept is a namespace. WORDLIST creates an anonymous wordslist and is a standard word. VOCABULARY is not a standard word, mostly it will create a wordlist and couple a name and behaviour to it.

AIML Parser PHP

I am trying to develop Artificial Bot i found AIML is something that can be used for achieving such goal i found these points regarding AIML parsing which is done by Program-O
1.) All letters in the input are converted to UPPERCASE
2.) All punctuation is stripped out and replaced with spaces
3.) extra whitespace chatacters, including tabs, are removed
From there, Program O performs a search in the database, looking for all potential matches to the input, including wildcards. The returned results are then “scored” for relevancy and the “best match” is selected. Program O then processes the AIML from the selected result, and returns the finished product to the user.
I am just wondering how to define score and find relevant answer closest to user input
Any help or ideas will be appreciated
#user3589042 (rather cumbersome name, don't you think?)
I'm Dave Morton, lead developer for Program O. I'm sorry I missed this at the time you asked the question. It only came to my attention today.
The way that Program O scores the potential matches pulled from the database is this:
Is the response from the aiml_userdefined table? yes=300/no=0
Is the category for this bot, or it's parent (if it has one)? this=250/parent=0
Does the pattern have one or more underscore (_) wildcards? yes=100/no=0
Does the current category have a <topic> tag? yes(see below)/no=0
a. Does the <topic> contain one or more underscore (_) wildcards? yes=80/no=0
b. Does the <topic> directly match the current topic? yes=50/no=0
c. Does the <topic> contain a star (*) wildcard? yes=10/no=0
Does the current category contain a <that> tag? yes(see below)/no=0
a. Does the <that> contain one or more underscore (_) wildcards? yes=45/no=0
b. Does the <that> directly match the current topic? yes=15/no=0
c. Does the <that> contain a star (*) wildcard? yes=2/no=0
Is the <pattern> a direct match to the user's input? yes=10/no=0
Does the <pattern> contain one or more star (*) wildcards? yes=1/no=0
Does the <pattern> match the default AIML pattern from the config? yes=5/no=0
The script then adds up all passed tests listed above, and also adds a point for each word in the category's <pattern> that also matches a word in the user's input. The AIML category with the highest score is considered to be the "best match". In the event of a tie, the script will then select either the "first" highest scoring category, the "last" one, or one at random, depending on the configuration settings. this selected category is then returned to other functions for parsing of the XML.
I hope this answers your question.

Is there a clear algorithm for sorting/ordering the loops in an X12 file?

Even though loops are kind of a logical concept in X12 (not directly physically represented in the text), every transaction set defines a set of loops that it can contain, including identifiers for the loops and an ordering for them. My question is, what is the rule for sorting loops, generically? Is there a concise set of rules that can be expressed in some code that should be able to take a collection of loops (with known identifiers such as 1000A, 2300BB, etc) and properly sort them?
The context of my question is that I'm working on a general-purpose library that applications will use to construct a model of an X12 document/transaction-set (and write out the text such a model represents). It has objects to represent Elements, Segments, and Loops. Ordering of Segments in a particular Loop is easy, they're dictated by the Implementation Guides. But I'm trying to get Loop ordering (within a Transaction Set) to work generically; that's what I'm asking about
It seems that the general rule is that Loops are ordered based on their identifiers using the numeric portion as the primary sort key, with the alpha portion as the secondary sort key. Of course hierarchical loops contained in others will be placed before and loops following the parent in that sort order (eg: 1000A, 2000A, 2010A, 2010B, 2100, 2300 - where 2010A and 2010B are children of 2000A).
I understand that the spec and Implementation Guides contain all of this info; I'm looking for the all-encompassing rule about loop ordering (not Segment ordering). Is there any concise way to express the rule algorithmically? Is there even a hard-and-fast rule at all?
As I mentioned in my comments, the standard has a loop value. Take a look at my screenshot of the Liaison Dictionary Viewer. The CLM segment has a LOOP value of 100. The segments underneath are children of the CLM segment (extended tag). Any "order" can be defined arbitrarily by the partner, or can be in any (undefined) order provided the data is qualified. But that loop can occur 100 times max and can have repeating segments inside the loop value.
The implementation guide will give you the correct order your partner wants them in. It seems like you're writing your own syntax validation engine though.

Latin inflection:

I have a database of words (including nouns and verbs). Now I would like to generate all the different (inflected) forms of those nouns and verbs. What would be the best strategy to do this?
As Latin is a highly inflected language, there is:
a) the declension of nouns
b) the conjugation of verbs
See this translated page for an example of a verb's conjugation ("mandare"): conjugation
I don't want to type in all those forms for all the words manually.
How can I generate them automatically? What is the best approach?
a list of complex rules how to inflect all the words
Bayesian methods
...
There's a program called "William Whitaker's Words". It creates inflections for Latin words as well, so it's exactly doing what I want to do.
Wikipedia says that the program works like this:
Words uses a set of rules based on natural pre-, in-, and suffixation, declension, and conjugation to determine the possibility of an entry. As a consequence of this approach of analysing the structure of words, there is no guarantee that these words were ever used in Latin literature or speech, even if the program finds a possible meaning to a given word.
The program's source is also available here. But I don't really understand how this is to work. Can you help me? Maybe this would be the solution to my question ...
You could do something similar to hunspell dictionary format (see http://www.manpagez.com/man/4/hunspell/)
You define 2 tables. One contains roots of the words (the part that never change), and the other contains modifications for a given class. For a given class, for each declension (or conjugation), it tells what characters to add at the end (or the beginning) of the root. It even can specify to replace a given number of characters. Now, to get a word at a specific declension, you take the root, apply the transformation from the class it belongs, and voilà!
For example, for mandare, the root would be mand, and the class would contains suffixes like o, as, ate, amous, atis... for active indicative present.
I'll use as example the nouns, but it applies also to verbs.
First, I would create two classes: Regular and Irregular. For the Regular nouns, I would make three classes for the three declensions, and make them all implement a Declensable (or however the word is in English :) interface (FirstDeclension extends Regular implements Declensable). The interface would define two static enums (NOMINATIVE, VOCATIVE, etc, and SINGULAR, PLURAL).
All would have a string for the root and a static hashmap of suffixes. The method FirstDeclension#get (case, number) would then append the right suffix based on the hashmap.
The Irregular class should have to define a local hashmap for each word and then implement the same Declensable interface.
Does it make any sense?
Addendum: To clarify, the constructor of class Regular would be
public Regular (String stem) {
this.stem = stem
}
Perhaps, you could follow the line of AOT in your implementation. (It's under LGPL.)
http://prometheus.altlinux.org/en/Sisyphus/srpms/aot
http://seman.sourceforge.net/
http://aot.ru/
There's no Latin morphology in AOT, rather only Russian, German, English, where Russian is of course an example of an inflectional morphology as complex as Latin, so AOT should be ready as a framework for implementing it.
Still, I believe one has to have an elaborate precise formal system for the morphology already clearly defined before one goes on to programming. As for Russian, I guess, most of the working morphological computer systems are based on the serious analysis of Russian morphology done by Andrey Zalizniak and in the Grammatical Dictionary of Russian and related works.

Resources