can predefined keywords exist beside free keywords in DSpace submit? - submit

Default a submitter (uploader) of a document can add self chosen keywords to that document.
It is also possible to configure DSpace in a way that the submitter has to choose from one or more predefined keywords (controlled vocabulary).
The DSpace manual seems to suggest that you - when configuring - have to choose between free and predefined keywords.
I would like to give the submitter the possibility to choose between one or more predefined keywords. But also that he or she can add one or more self chosen keywords.
Is that possible?

The hierarchical taxonomy feature gives you exactly this:
https://wiki.duraspace.org/display/DSDOC5x/Authority+Control+of+Metadata+Values#AuthorityControlofMetadataValues-HierarchicalTaxonomiesandControlledVocabularies
You can see it in the demo installation on the "subject" field: you have a lookup feature that allows lookup in a tree of subjects, but manually entered values are possible as well.
screencast:
http://screencast.com/t/0Cth3mORwxd

I personally would set this up to use two different metadata fields.
Something like dc.subject.whateverdescribesyourlistoffixedterms -- or even localschema.subject.whateverdescribesyourlistoffixedterms -- for the list of terms the user should select from. Note, for "whateverdescribesyourlistoffixedterms" I would choose something related to the name of the list of terms if at all possible (see example below).
dc.subject for "standard" user-supplied keywords
Then just add both to your input forms, perhaps going with Bram's suggestion of a hierarchical taxonomy for the first.
To give you better advice on what's most appropriate, it would be great if you could give some more details about what you're trying to achieve. For example
Is your list of fixed keywords something that's used beyond your own organisation? If yes, this strongly points to having its own metadata field to me, with the qualifier something that's related to the name of the classification system -- eg, dc.subject.anzsrc for the Australia/New Zealand fields of research codes.
Do you want to mix the two types of keywords in browse/facet options? You can do this even when they're in two separate fields. Have a look at the Discovery search filters & sidebar facets documentation and see how that puts dc.contributor.author and dc.creator into the author facet. The documentation for browse indexes has a similar example in the author browse.
Are both types of subject keywords required for submission? Both optional? One type required, the other type optional? You say in a comment (if I read you correctly) that you want the fixed keywords to be mandatory during submission, while the free-text keywords should be optional. That means they must be in separate metadata fields because otherwise you wouldn't know, if the submitter gives keywords, whether they are from the fixed list of terms or not. If you use separate fields, you can make eg dc.subject.anzsrc a required field in the submission form and dc.subject an optional one.

Related

how to create a replicable, unique code for a pre-ISBN book

I am putting my collection of some 13000 books in a mySQL database. Most of the copies I possess
can be identified uniquely by ISBN. I need to use this distinguishing code as a foreign key into
another database table.
However, quite a few of my books date from pre-ISBN ages. So for these, I am trying to devise a
scheme to uniquely assign a code, sort of like an SKU.
The code would be strictly for private use. It should have the important property that, when I
obtain a pre-ISBN publication, I could build the code from inspecting the work, and based on the
result search the database to see if I already have other copies in my possession.
Many years ago I think I saw a search scheme for some university(?) catalogue, where you could
perform a search of a title based on a concatenated string' (or code) that was made up of let's
say 8 letters from the title, and 4 from the author, and maybe some other data. For example,
to search 'The Nature of Space and Time' by Stephen Hawking and Roger Penrose you might perform
a search on the string 'Nature SHawk', being comprised of 8 characters from the title (omitting
non-filing words and stopwords) and 4 from the author(s).
I haven't been able to find any information on such scheme's, or whether or not such an approach
was standardized in any way.
Something along these lines could be made up of course, but I was wondering if people here have
heard of such schemes, of have ideas on how to come to a solution to this.
So keep in mind the important property of 'replicability': using the scheme, inspection of a pre-
ISBN dated work should --omitting very special or exclusive cases-- in general lead to a code
that can singly be used to subsequently determine if such a copy is already in the database.
Thank you for your time.
Just use the Title (add Author and Publisher as options) and a series id to produce a fake isbn. Take a look at fake_isbn.
NOTE: use the first digit as a series id but don't use 9!

Delphi - What Structure allows for SAVING inverted index type of information?

Delphi XE6. Looking to implemented a limited style of search, specifically an edit field for the user to enter a business name which would get looked up. I need to allow the user to enter multiple words, or part of multiple words. For Example, on a business "First Bank of Kansas", user should be able to enter "Fir Kan", and it should return a match. This means an inverted index type of structure. I have some type of list of each unique word, then a (document ID, primary Key ID, etc, which is an integer). I am struggling with WHAT type of structure to make this... I have approximately 250,000 business names, which have 43,500 unique words. Word count will vary from 1 occurrence of a word to several thousand (company, corporation, etc) I have some requirements...
1). Assume the user enters BAN. I need to find ALL words that start with BAN. I need to return BANK, BANKER, etc... This means that whatever structure I use, I have to be able to find BAN and then move to the next alphabetic entry... and keep moving to the next until I find a value that does NOT start with BAN. This eliminates any type of HASH structure, correct?
2). I obviously want this to be fast. HASH is the fastest, but I can't use this, correct? See requirement 1.
3). Each entry in this structure needs to be able to hold a list of integers. If I end up going with a LinkedList, then each element has to hold a list of Integers.
4). I need to be able to save and load this structure. I don't want to have to build it each time I use it.
Whatever I end up with, it appears to have to be a NESTED structure, a higher level list (LinkedList?) with each node being an Integer List.
What am I looking for? What do commercial product use? Outlook, etc have search capabilities.
Every word is linked to a specific set of IDs, each representing a business name, right?.
I recommend using a binary tree data structure because effort for searching is normally log(n), which is quite fast. Especially, if business names are changing at runtime, an AVLTree should do well, although it's quite some work to implement it by yourself. But there should be many ready-to-use units on binary trees all over the internet.
For each successful search for a word in your tree data structure, you should take their list of IDs and aggregate those grouped by the entered word they succeeded for.
As the last step you take all those aggregated lists of IDs and do an intersection.
There should only be IDs left which are fitting to all entered words. Those IDs are referencing the searched business names.

CRUD operations on list of items instead of a single one in ChicagoBoss

I'm a newbie to ChicagoBoss and Erlang in general, so please bear with me.
I have a model of Options which represent a number of site configurations (think of the available options in WordPress, since it's modeled after it), to which I have to perform CRUD operations on.
The model looks like this:
-module(options,
[
Id,
KeyName::string(),
Value::string(),
IsActive::string()
]
).
-compile(export_all).
Each option is prefixed by its category, so general options names look like "general_option_" followed by its specific name.
The views for Options are mostly a list of inputs with each input linked to a specific option, as you might expect.
Since the number and name of options is not known beforehand (except in the view), I would like to know what approaches there are for dealing with this case, as every example I've seen so far deals with a single item, and not a list of them. Please share any advice or constructive criticism you have, as it will be very welcome.

form-only lookup

How can I create a form-only look up in Informix 4GL? I am using form painter plus the informix SE. Any help would be appreciated. I tried to create the form but the field is empty while selecting the choice. I think I am missing the relation or something.
FORMONLY is the equivalent of DISPLAYONLY in isql perform screens. Why not just define the database columns in the attributes section and use the NOUPDATE attribute for each column, or use BEFORE EDITUPDATE OF tabname, ABORT?
Since I4GL doesn't come with a form painter, the only ways to know what you can do with it is by reading the manual for your form painter, or by experimenting.
I'm also not entirely sure what you mean by a FORMONLY lookup? It could be any of a number of items. But the basics are that the field in the form is FORMONLY.fieldname TYPE xyz where xyz is the appropriate type. You use a CONSTRUCT or INPUT to get data into that field; you process the input to do the lookup. INPUT is more appropriate for an exact value lookup; CONSTRUCT will allow more flexible querying.
Since you've not shown what you've tried, nor indicated which form painter you're using, it is going to be hard to help further.
(And I note you've asked this question on the IIUG (International Informix Users Group) mailing list for 'classics' too.)

is there an algorithm to find out which words in a search-string belong together?

I was thinking about text driven search by user input.
often you are searching in a database of addresses, where you can find customers and so on.
has anybody any idea how to find out which of the typed words is the name, which is the street name, which is the company name?
and secondly if the name is a double name like "Lee Harvey", how can I find out that the two words Lee and Harvey belong together?
Same problem with company names like "frank the baker inc."...
Is there any algorithm or best practice strategy?
thanks for links, tutorials, scripts and all other help ;-)
What you basically want is a search engine :) Here are the basic steps you need to follow -
You need to create an 'Inverted Index' of the content you want to be searched on.
The index is 'name'=>'value' pair. You can have this pair in whichever way you want (tuned according to your data & needs.
Eg. for your problem of double names, you could split all your names into single words & index it like so -
'lee'=>'lee harvey'
'harvey'=>'lee harvey'
...
this way when anyone searches for 'lee' they get 'lee harvey'. There are other better approaches to this called "n-gram" indexing. Check it out...
You could possibly build indexes of names, addresses, emails etc & when the user types a query check it against all your indexes with the approach suggested above. After you get the results then merge them. Maybe you could introduce the notion of rank so that you can sort your results & show the most latest or most relevant ones at the top. For this you need to figure out a way to score your terms...
Don't care, just perform full-text search. Then you should check the result items for which field contains the search terms. Also, you may display items in separate lists (terms found int name, term found in address). The only difficulty is if John Smith is living in the John Smiht street, you must decide, which list/lists the result item belongs to.

Resources