Datascope | Data Stories

Background

Datascope is a web application designed to help journalists uncover the “data” potential hidden inside an article. The tool combines local Natural Language Processing (NLP) with generative AI to extract key entities from a text, compute a “datafication” score, suggest editorial angles, and recommend relevant open data sources.

The goal is to provide a lightweight assistant that supports the investigation and ideation phase, turning a raw article into actionable leads for data journalism.

🔗 Live app: datascope.onrender.com

Features

Entity extraction: automatic detection of people, places, dates, numbers and organisations via local NLP
Datafication score: a synthetic indicator that estimates how much data potential an article holds
Editorial angle suggestions: LLM-powered generation of journalistic angles
Open data sources: recommendations of relevant APIs or datasets to explore further
Multilingual support: analysis of articles in French and English (selectable through the UI)
Multi-format input: handles raw text, PDF files and Word documents
Export: download the analysis as Markdown or JSON

Tech stack

Backend and NLP

Python 3.12 and Flask for the web application
spaCy for local language processing (fr_core_news_sm, en_core_web_sm)
OpenAI API for editorial angles and content suggestions
pdfminer and python-docx for multi-format input handling

Quality and testing

pytest for unit and integration tests
Test coverage above 80% (pytest-cov)
black for code formatting

Deployment

Containerised with Dockerfile and docker-compose.yml
Deployed in production on Render.com
Simple email-based authentication

Architecture

The project is built around clearly separated modules:

app/: Flask routes, views and Jinja2 templates
core/: processing pipeline (NLP, LLM calls, parsing, exports)
tests_pytest/: unit and integration tests
uploads/: temporary storage for submitted articles

This separation allows the analysis layer (NLP + LLM) and the web interface to evolve independently.

Project status

Deployed and live. The application is publicly accessible and already covers the main use cases: multi-format import, bilingual analysis, angle generation and export. Future iterations will focus on enriching the catalogue of open data sources proposed and refining the datafication score.

Useful links

Live app: datascope.onrender.com
Source code: github.com/hericlibong/datascope_project

Tags: journalism data nlp flask python spacy openai llm datafication