1. médialab Sciences Po
  2. News
  3. DSA Ad Repository Data Sprint

DSA Ad Repository Data Sprint

A data sprint on platforms advertising registries took place at the médialab in the week of 24 February 2025. Organised by The Open Institute for Digital Transformations and by the project ‘Putting the DSA into Practice’, this event aimed, on the one hand, to test the existence and availability of “Ad Libraries” for social networking platforms and, on the other, to see whether new research topics could emerge from the use of this new data.

Post

The Digital Services Act (DSA) is a European regulation whose aim is to reduce the spread of illegal content and enhance transparency between online platforms and their users. The DSA introduces two key requirements: all advertisements must be clearly identifiable, with transparent disclosure of advertisers and funding sources (Art. 26); and large platforms must maintain public ad libraries (Art. 39). It came into force on 25 August 2023 and has been applied to all platforms since 17 February 2024.

This event brought together six research groups, each focusing on a particular subject.

Policy violations in post-DSA ad ecosystem

Building upon the previous work conducted by AI Forensics, this group focused on health-related scams disseminated through advertisements on Meta platforms. From the Meta Ad Library, which comprises over 470 million advertisements, the group identified duplications, i.e. instances where multiple pages display the same advertisements. By conducting keyword searches, they identified over 46,000 health-related scam advertisements that were shown to EU users over 292 million times.

These advertisements violate Meta’s community standards and numerous advertising policies, yet Meta reviewed and approved them. The group’s work, covered in the press and commented by Meta, documents Meta’s systemic failure to moderate its advertising ecosystems, potentially leading to foreseeable negative consequences for public health and raising compliance concerns under the Digital Services Act.

More information: https://aiforensics.org/uploads/meta_health.pdf

Biotech ads 

This group investigated how biotech investments and future-oriented products are advertised across four major platforms: X, Meta (Facebook & Instagram), Google and TikTok. They found that ads archive APIs are inconsistent, poorly documented, and often unreliable. Much of the retrieved data was unstructured or confusing. In some cases, they stumbled upon unexpected patterns such as recurring Japanese or Polish tropes. Overall, the epistemic value of the data in its current form was close to nil.

While the datasets clearly contained signals, any serious research effort was hindered by both input and output problems. Researchers couldn’t trace how the datasets were constructed, or retrieve and contextualize the ads at scale. Despite these challenges, the presence of questionable or unlawful content suggests a lack of enforcement from platforms. The group suggests that future research should focus on a single platform or topic and develop a more aggressive ad-hoc toolkit to get the data from the online web interfaces.

Political content in TikTok ads: Romanian 2024 presidential election

Despite TikTok’s terms and conditions prohibiting political content, this group uncovered how political messaging still found its way into the platform during Romania’s 2024 presidential election. They scraped all commercial contents published in Romania between October and December 2024, downloading cover images and running Optical Character Recognition (OCR) to extract text embedded in pictures. They especially searched for posts related to candidate Călin Georgescu whose election was canceled due to suspected Russian manipulation on TikTok.

Their findings reveal that Georgescu’s campaign on TikTok bypassed the platform’s restrictions, promoting the candidate through contents that referenced elections, democracy, and national identity. The campaign’s mottos and ideas circulated via these ads, even though TikTok's terms prohibit such content, and there is no information about who financed these videos. The group concludes that their approach could be replicated in other national contexts.

The code and data are available on a github repository.

Targeted dimensions (Meta)

This project aimed to explore how Meta ads leverage different types of call to action and employ specific identity cues to trigger behaviors in relation to societal issues. By analyzing ads along three targeted dimensions — call to action, societal issue and identity triggers — the group wanted to understand how political and civic engagement is shaped. This approach allows to look at how ads not only present issues but also attempt to activate specific responses, from signing a petition to attending a protest, often through emotionally resonant identity-based messaging.

The findings revealed notable correlations, first between societal issues and call to action types. For example, ads calling for petitioning were linked to environmental causes, while protest-oriented calls to action appeared more frequently in ads focused on equality. They also found that ads with calls to action related to protests refer to equality issues more than expected, and that the identity dimensions were linked to societal issues. Although only a small subset of ads contained explicit calls to action, the three-dimensional method offers an alternative to Meta's classification by distinguishing mobilization-related ads with greater precision. The analysis faced limitations due to the reliance on lexicon-based methods and pretrained models like manifestoBERTa, which are not fully optimized for the advertisement domain. Future work could involve an actor-based approach and expanded datasets to enhance detection of mobilization strategies.

Gender stereotypes in job ads

This team investigated how gender bias manifests in online job advertising. They wanted to understand whether gender-based targeting is allowed, as rules vary between platforms. To explore these dynamics, the group collected 700 job ads, analyzing both their content (text, images, video, audio) and delivery metrics. Using the INSEE classification for job categories, they examined who appeared in the ads and how audiences were reached.

Their findings revealed stark patterns: men are more frequently shown job ads, and particularly more ads for executive or white-collar roles. In addition, women appear more often in the content of the ads (about 50% feature women, compared to 30% featuring men). However, the gender of the person strongly influences who saw the ad: ads featuring women reach more women, and vice versa. Future work may include refining job and content classification, conducting robustness checks, and examining platform-specific differences to better understand the structural roots of these biases.

Unlabeled Politics: Meta’s hidden ad ecosystem in France

This project explored the prevalence and risks of unlabeled political advertising on Meta platforms in France. Political ads, when not properly labeled, can pose critical threats such as circumventing campaign finance laws, spreading covert influence, obscuring accountability, and eroding public trust. Despite the existing regulations like the DSA, the definition of “political content” stays vague. By creating a list of 160 political keywords focused on French political, electoral and social issues, they matched these terms against a dataset of over 15 million French-language Meta ads. They found that 30% of these ads could be considered politically relevant, yet were not labeled as such.

The analysis revealed that among the 300 top advertisers identified by ad spend and proportion of political content, 47 were found to have published politically relevant ads without proper labeling. This suggests not only enforcement inconsistencies but also potential systemic bias in moderation practices. The team concludes that Meta’s current transparency measures fall short and calls for clearer definitions, stronger enforcement and improved data access to ensure accountability.