Is there any way I can see the distribution over topics (topic mixtures) per document for the Dynamic Topic Model in Spyder using the Gensim module?
I am only aware of 'print_topic_times' that shows one topic (distribution over words) over all time slices.
However, is there any code that allows to see the topix mixture of one document for each time slice?
I am not sure about Spyder but for visualization of topics it is easiest to use pyLDAvis.
Related
I have this timeline from a newspaper produced by my Native American tribe. I was trying to use AWS Textract to produce some kind of table from this. AWS Textract does not recognize any tables in this. So I don't think that will work (perhaps more can happen there if I pay, but it doesn't say so).
Ultimately, I am trying to sift through all the archived newspapers and download all the timelines for all of our election cycles (both "general" and "special advisory") to find number of days between each item in timeline.
Since this is all in the public domain, I see no reason I can't paste a picture of the table here. I will include the download URL for the document as well.
Download URL: Download
I started off by using Foxit Reader on individual documents to find the timelines on Windows.
Then I used a tool 'ocrmypdf' on ubuntu to ensure all these documents are searchable (ocrmypdf --skip-text Notice_of_Special_Election_2023.pdf.pdf ./output/Notice_of_Special_Election_2023.pdf).
Then I just so happened to see an ad for AWS Textract this morning in my Google Newsfeed. Saw how powerful it is. But when I tried it, it didn't actually find these human-readable timelines.
I'm hopefully wondering if any ML tools or even other solutions exist for this type of problem.
I am namely trying to keep my tech knack up to par. I was sick the last two years and this is a fun problem to tackle that I think is pretty fringe.
I want to train a machine learning system such as IBM Watson using some PDF, txt, html unstructured data, and then ask questions and get answers via API calls. How can I achieve that? GUI based training or API based training. From Bluemix, it is hard to decide which service is best to achieve this requirement. Can you please suggest the best options?
Retrieve and Rank- Retrieve and Rank can surface the most relevant information from a collection of documents. For example, using R&R, an experienced technician can quickly find solutions from dense product manuals. A contact center agent can also quickly find answers to improve average call handle times. The Retrieve and Rank service works "out of the box," but can also be customized to improve the results. More details here
Discovery Service- Extract value from unstructured data by converting, normalizing, enriching it. Use a simplified query language to explore that data or to quickly tap into pre-enriched datasets like the Discovery News collection. More details here
I would recommend Watson Discovery (https://www.ibm.com/watson/services/discovery) for your purpose.
It's very complete and supports many features in both GUI and API. It supports questions in natural language or in query format.
Its documentation is here: https://console.bluemix.net/docs/services/discovery/getting-started.html#getting-started-with-the-api
If you create a free instance of Watson Discovery, you can test its API here: https://watson-api-explorer.mybluemix.net/apis/discovery-v1
There are examples of each API call here: https://www.ibm.com/watson/developercloud/discovery/api/v1/
There is also a demo and respective code here:
https://discovery-news-demo.mybluemix.net/?cm_mc_uid=30407807098515090430617&cm_mc_sid_50200000=1509636542&cm_mc_sid_52640000=1509636542
and
https://github.com/watson-developer-cloud/discovery-nodejs
In Chapter 11: Access Types of the book: Rendez-vous with Ada by Naiditch (1995), Naiditch gives a rather complete example on how to create a linked list that contains information about a restaurant. I understand largely the data structure following the book's example. I can understand that any information that the user is entering in the linked list will only exist during the lifetime of a program. The author is not storing any information about the restaurant say as text files. So what's the use of the linked list example if all information entered by the user is not stored after the user exits the program?
Does it make sense to store user entered information say in a text file and then read them into a linked list so as to do further operations on them? But then doing operations such as adding or deleting entries will disturb the original text file from which the linked list was read from at the start.
Thank you.
PS: As you might have noted, I am trying to get a real-life example of a linked list and I am new to this data structure as well.
As discussed here, the 1995 example predates the addition of Containers to the predefined library of Ada 2005. The textbook example may guide your understanding of concrete implementations encountered a particular Ada library. See 8.1 Organization of containers for an overview.
I am using stanford ner for removing the identity from the essays.
It is detecting the names like Werner..But indian names such as ram, shyam etc. goes undetected.
What i should do to make them recognizable.
You should train NER for Indian names. I could not find detailed information for how to achieve that. But this FAQ page ( http://nlp.stanford.edu/software/crf-faq.shtml#a ) has some information which may be a starting point for you. Especially the questions 2-3 are directly related to your question.
I was planning to write a recommender which treats preferences differently depending contextual information (time the preference was made, device used to make the recommendation, ...)
Within the Mahout in Action book and within the code examples shipped with Mahout I can't seem to find anything related. In some examples to there's metadata (a.k.a content) used to express user or item similarity - but that's not what I'm looking for.
I wonder if anyone already made an attempt to do sth similar with Mahout?
Edit:
A practical example could be that the current session is done on a mobile device and this should cause a push up (rating*1.1) for all preferences tracked on mobile devices and a drop for preferences tracked differently (rating*0.9).
...
Another example could be that some ratings are collected implicit and others explicit. How would I be able to keep track of this fact without "coding" that directly into the tracked value and how would I be able to use that information when calculating the scores?
I would say one approach is to use the Rescorer class to do just that, but my guess is that this is what you are referring to when you say that's not what you are looking for.
Another approach would be to pre-process the entire data you have to adjust the preferences according to your needs, before using Mahout to generate recommendations.
If you provide some more detail on how you expect to use your data to modify preferences, people here would be able to help even further.