References and Bibliography in two distinct chapters of Quarto book - lua

I would like to put References and Bibliography in two distinct chapters of the book. The "References" are the things actually cited in the text, but the "Bibliography" is just a manually created chapter before or after the References chapter.
So, i would like to write a chapter file bib.qmd like:
# The bibliography
#source1
#source2
#source3
... etc
However, i haven't found a way to obtain the full content using cites, i only get author or number depending on the CSL. Obviously i could write all that content by hand, but i prefer to do it through the corresponding citation.
I have read about including uncited items, and sound like what i want but i need them in a different chapter and not merged within the references.
Im thinking to write a lua filter to run after quarto's citeproc, and somehow reuse the output of citeproc but not sure if this is a viable path.

The idea to use a Lua filter is a good one. The biggest challenge is to collect all uncited items. You'd first get the full list of available items with the pandoc.utils.references function, collect all used keys by filtering on all Cite keys, and then use pandoc.utils.citeproc to generate and process a document with the uncited references.
If you have all uncited items in a single .bib file then you could use a pre-existing filter like multibib. Otherwise you might be able to adapt that filter to fit your requirements.

Related

Recognizing language patterns in a list of sentences on Google Sheets

I am trying to analyze a series of sentences by identifying the most common adverb-adjective-noun strings. I have managed to get answers for how to do so with random words but I think this is a standalone question, and it might better to be dealt with separately.
In this case, I would like to omit common word types like personal pronouns, articles, prepositions and even verbs. Ideally, the results should produce:
Most common nouns
Most common adjectives
Most common adverbs
Most common adjective+noun strings
Most common adverb+noun strings
I understand there is a way to do this by using an online dictionary but I have been unable to integrate that in my code to get the results I want. Is there any way of automating this without listing all the words that you want omitted? How could it be done?
Here's a link to the spreadsheet I'm using (for this particular query, see page 2) and a screenshot of the types of text I would like to analyze with a manual color-coded visualization of what I want to achieve:

how to create a replicable, unique code for a pre-ISBN book

I am putting my collection of some 13000 books in a mySQL database. Most of the copies I possess
can be identified uniquely by ISBN. I need to use this distinguishing code as a foreign key into
another database table.
However, quite a few of my books date from pre-ISBN ages. So for these, I am trying to devise a
scheme to uniquely assign a code, sort of like an SKU.
The code would be strictly for private use. It should have the important property that, when I
obtain a pre-ISBN publication, I could build the code from inspecting the work, and based on the
result search the database to see if I already have other copies in my possession.
Many years ago I think I saw a search scheme for some university(?) catalogue, where you could
perform a search of a title based on a concatenated string' (or code) that was made up of let's
say 8 letters from the title, and 4 from the author, and maybe some other data. For example,
to search 'The Nature of Space and Time' by Stephen Hawking and Roger Penrose you might perform
a search on the string 'Nature SHawk', being comprised of 8 characters from the title (omitting
non-filing words and stopwords) and 4 from the author(s).
I haven't been able to find any information on such scheme's, or whether or not such an approach
was standardized in any way.
Something along these lines could be made up of course, but I was wondering if people here have
heard of such schemes, of have ideas on how to come to a solution to this.
So keep in mind the important property of 'replicability': using the scheme, inspection of a pre-
ISBN dated work should --omitting very special or exclusive cases-- in general lead to a code
that can singly be used to subsequently determine if such a copy is already in the database.
Thank you for your time.
Just use the Title (add Author and Publisher as options) and a series id to produce a fake isbn. Take a look at fake_isbn.
NOTE: use the first digit as a series id but don't use 9!

Csv bounded source with a custom line Delimitter

I want to read a csv file with a line Delimiter other than the default line delimiter. Each csv record spans multiple lines so the TextIO.Read does not suffice.
Should I extend the FileBasedSource or is there any existing CsvBasedSource (with a custom line/fields delimiter).
I was looking in to the splitIntoBundles() api, the XmlSource did not override the isSplittable() and so it can be split in to bundles and was wondering how the XmlSource handles this because the split can happen at the middle of a <record> as the split is happening based on the desiredBundleSize only.
That's correct that this will need a custom FileBasedSource implementation to work. Regarding XMLSource, record and root element names have to be unique (i.e. no other elements can have those names). We'll update the documentation to reflect that, and look at improving this in the future.

Database design for book structure (table of contents) and content

I have a list of entries, which can be thought of as paragraphs from a book, stored as separate objects of the same class. These objects have a ‘num’ property, along with the actual text, so that I know their order and can later display them in as a list in the correct order (1,2,3, …).
Now I want to bring this one step further and be able to ‘record’ the structure of the book, like the table of contents. In other words, say the book is divided into chapters, and each chapter is further divided into sections. The first few paragraphs are found under Ch.1 Sec.1, then Ch.1 Sec. 2, and so on all the way to Ch. n, S. m. What I’m not sure of is what’s a good way to record this information? I've been told that I should use a database with SQL but I'm not sure where to begin.
The implementation must allow me to ‘quickly’ determine the following two things at any point: (1) Given a chapter and section #, what paragraphs are contained within this section? (2) Given a paragraph #, which chapter and section is it under? It must also be flexible enough that I could use the same platform in the future with few edits if the structure (depth-wise) of the book changes (e.g. sections are divided into subsections, etc.). Finally, should be able to handle optional divisions (i.e. some sections have subsections while others do not).
This is for an iOS app and my code is written in Objective-C so far.
SQL would certainly be one possibility. If you follow this route, there is a certain trade-off between flexibility and easy of coding which impacts maintainability. For example, if you build a fixed structure, say with some additional levels attempting to cater for the future, such as:
Book
Chapter
Section
Sub-section
Paragraph
you will have code with unambiguous references, such as section.fk_chapter, paragraph.fk_subSection, etc. This will make it easier to troubleshoot and build queries. However you have the problem of having to refactor your code a fair amount if you wanted to add, say, sub-paragaphs, or sub-sub-sections. Your UI will be simpler to code in this approach as you always know which "level" you are working at. Alternatively, you can go for a hierarchical approach:
Book
Chapter
Content Item
Content Item
Content Item
....
where the contentItem table has a self-reference foreign key. This has the quite big advantage of allowing you any number of levels. Some attribute on the Content Item could tell you the name and "type" of level you are at if needed. It is definitely much more flexible, but will come with some complexity in implementation and UI presentation. columns called contentItem.fk_contentItem to refer to the parent level do not tell the coder where they are in the hierarchy. Queries will be a bit more difficult to write. The UI will have to cater for "any" number of levels. But on the other hand, these problems are not insurmountable and many have gone before you on this route.
Your question is quite broad, so opinions will vary on the approach and the above is admittedly very general.

Lucene partial word matching

Lucene does not support it out of the box, so I need some help building my query.
Lets say I have the document with a field value "Develop"
I would like this document to be returned for the searches "Dev" and "lop".
Maybe creating two queries?
"*keyword"
and
"keyword*"
and
"keyword"
?
How would you go about doing this with multiple words? Would you split the sentence/search into a words list and do the previous example for each word?
What you're asking is if I understand you correctly not feasible on any large scale search engine.
Lucene creates an index over keywords using term-document matrix and inverted-file techniques (see links at the bottom). A fully fledged string matching might be very nice to have, but it does not scale: you will never be able to query a decently sized index (say more than a couple of dozen/hundreds of documents) in an acceptable time.
Still, here are two ideas that might help...
Syllable tokenization
To come back to your example with 'Develop'. As long as you are happy with letting users search for syllables I guess you can do something.
You would have to create use tokenizer that splits up words in your indexed according to their syllables and create a database index over the syllables. (I am not sure there are built in tokenizers for the English language that can do that and writing one on your own might be tricky...)
An important thing to note:
If you would index the full words AND the seperate syllables the size of your index will be much larger than if you only index one of the two.
However I would not suggest to index only syllables. If you want to also allow your users to search for the full word 'Develop' (which I guess you want) this would result in two queries with a logical and between them, namely <'dev' AND 'lop'>. Although Lucene supports such logical constructs in queries they are very expensive. I have personally had some trouble in the past using logical queries in Lucene.
Stemming
Another way to somehow arrive at what you're trying could be to use a brutal form of word stemming (http://en.wikipedia.org/wiki/Stemming) that stems words to their first syllable. (This would allow to search for 'dev' but not for 'lop'...)
Again, I don't think such a word stem feature is already in Lucene. Writing one for yourself will be a pain and involve working with/importing huge dictionaries.
Links
These might be looking into if you don't know about search engine internals:
http://en.wikipedia.org/wiki/Index_%28search_engine%29
http://en.wikipedia.org/wiki/Vector_space_model
http://en.wikipedia.org/wiki/Inverted_file
http://en.wikipedia.org/wiki/Term-document_matrix
http://en.wikipedia.org/wiki/Tf-idf

Resources