2.1. Presentation of the NewsEye Platform
2.1.4. The datasets pages
When working on a particular research topic, users need to find documents relevant to the questions they are asking. These documents can be found after performing various searches on the NewsEye platform. One essential aspect of this process is to be able to save these documents (or a reference to these documents). This can be done in the platform as "datasets". Users can create as many datasets as needed to gather meaningful documents to answer a particular research question or documents that belong together according to the user's needs.
After clicking the "Datasets" link in the menu, you are taken to this page:
Here several actions are possible:
-
you can create a new empty dataset
-
you can import a dataset publicly shared by another user
-
you can access the compound articles you previously created
-
you can access your previously created datasets
The datasets show page
After clicking on an existing dataset, you can visualise its content as a list, similar to the one on the search results page. You also can share your dataset with other platform users by setting its status to "public". The named entities in your dataset are presented the same way as they are on the document show page.
Last but not least, you can export your dataset. This allows you to keep a local copy or use various analysis tools. First, a ZIP export is available. It gathers the documents of the dataset as a list of files containing the text for each document. Datasets can also be exported as JSON. These files contain a bit more information than the previous export. Not only do they have the text, date and named entities in the documents, but they also include an IIIF link to the original image.
Adding documents to a dataset
On the search engine results page, after a query, you can select one or several documents to add to the current dataset you are working on using the "Datasets handling" panel. You can choose documents by clicking on them or adding all documents to the dataset. After a search, you can quickly view if documents are already part of some datasets thanks to a small tag next to the document in the results list. Documents (issues, articles and compound articles) can also be added to a dataset in the same way from individual document show page.
The experiment page
This section of the platform allows you to analyse the previously created datasets. It is a work in progress, and not many tools are available. In the rest of the course, we will create a dataset, add documents, export it, and analyse it using Python.