On January 18 Roberto Di Cosmo will give a talk on “What would you do with billions of source code files? News from Software Heritage”.

The event will take place in Klasserommet@Simula at 11:00, January 18.


From ten years of working on analysing the characteristics of large open source software repositories, we draw some lessons on the key properties we need for this kind of software engineering large scale studies. This led us to launching Software Heritage, the most ambitious project to date to build a universal source code software knowledge base. The size of this archive is daunting, with billions of unique source code files, coming from tens of millions of repositories. We detail the mission of Software Heritage and highlight some of the new challenges and opportunities, both organisational and scientific, that Software Heritage brings up.

Short biography

After obtaining a PhD in Computer Science at the University of Pisa, Roberto Di Cosmo was associate professor for almost a decade at Ecole Normale Supérieure in Paris, and became a Computer Science full professor at University Paris Diderot in 1999. He is currently on leave at Inria.

He has been actively involved in research in theoretical computing, specifically in functional programming, parallel and distributed programming, the semantics of programming languages, type systems, rewriting and linear logic. His main focus is now on the new scientific problems posed by the general adoption of Free Software, with a particular focus on static analysis of large software collections, that were at the core of the european research project Mancoosi.

Following the evolution of our society under the impact of IT with great interest, he is a long term Free Software advocate, contributing to its adoption since 1998 with the best-seller Hijacking the world, seminars, articles and software. He created the Free Software thematic group of Systematic in October 2007, and since 2010 he is director of IRILL, a research structure dedicated to Free and Open Source Software quality.

In 2016, he co-founded and directs Software Heritage, an initiative to build the universal archive of all the source code publicly available.