Using Big Data Analysis for Visualisations of Public Affairs Content in Online News
One of the main aims of Media Pluralism and Online News project is to discover methods of computationally evaluating media pluralism in ways that are relevant for multi-platform news ecosystems. In brief, the concept of ‘public affairs’ allows us to separate and identify the kind of content that contributes to media pluralism. We think this term is useful in showing the news content that should be valued (and receive funding) — that is, the aspect of news media which constitutes a public good and for which the business model is in a state of crisis.
Classifying content as ‘public affairs’ is therefore a mechanism for identifying material that might be the subject of public subsidy, philanthropy or some form of regulatory intervention. It avoids the traditional distinction between ‘hard news’ and ‘soft news’, which can be useful when material that is usually seen as soft news has a public affairs angle. For example, a sports article might be about the need for public funding, health or corruption in sport. Even articles apparently about celebrities can have some kind of social function. It also avoids the need to distinguish between ‘news’ on the one hand and comment/analysis/current affairs on the other: all such content is covered by the code provided it has a public affairs angle.
Making a distinction between public affairs and non-public affairs content is not meant to suggest non-public affairs has no value – it might, for example, be important in maintaining a sense of community – but it does allow us to identify material that is part of news media’s role in contributing to government, public administration and civic society in a democratic society. Again, media more broadly contributes to other areas of life – for example, Australian drama programs contribute to Australia’s cultural life, while children’s programs contribute to childhood learning – but the target of subsidies and other interventions in relation to news is this public affairs purpose. Importantly, this content must be provided by a news media organisations that employ professional journalists, with some established presence (for example, at least 12 months operation), and possibly some qualifying threshold level of public affairs content. Book publishers, bloggers and others create content that deals with public affairs. This project seeks to support news media which demonstrates additional features such as immediacy, verification of sources and trained journalists working to professional standards.
We define public affairs (relative to non-public affairs) in the following way:
Public affairs reporting conveys timely factual and opinion based information about events and issues in government, politics, business, and public administration. This will include education, health, science and other matters that have broad social significance. Examples are items that cover contentious public debates on climate change, immigration, and land use.
Non-public affairs reporting conveys timely factual and opinion based information about topics of entertainment, art and culture, leisure and lifestyle. This will include sport, well-being, fashion, and music. A sports article that just gives sports results or commentary, for example, will be non-public affairs, unless it has a public affairs angle such as government funding or health concerns.
It should also be noted that PA content is different from ‘public interest journalism’, although the two often overlap. We think ‘public interest journalism’ is a very useful term to describe a certain activity and the content that results; however, sometimes it is seen as a narrow concept (in the sense of investigative reporting) while other times it is seen as expansive (for example, as all hard and soft news provided this is produced by journalists). ‘Public affairs content’ might be seen as sitting in the middle, but in essence it is any content, produced by a news media organisation, that has some public affairs angle. In the example given above, it would cover an article that is largely about sports results if the article also dealt with the lack of suitable facilities at a sports ground. While such a report is included in some definitions of ‘public interest journalism’, it is excluded from others. We recognise that these definitions are still in development and over time they may coalesce further. It may be that a definition produced by the Public Interest Journalism Initiative, for example, is the best fit for regulatory settings, and that our tool could be adapted in order to identify PIJI’s formulation of ‘public interest journalism’.
So these terms ‘public affairs’, ‘public interest’ or some other formulation, remain useful for various applications of public policy and regulation. At its edges, it would include some commercial content, as well as other forms of content which are likely to be commercially viable or do not otherwise deserve regulatory intervention and financial support.
Identifying ‘public affairs content’
Our ARC Media Pluralism and Online News Discovery Project team has been working with data scientists from Sydney Informatics Hub at the University of Sydney to develop a computational tool that allows us to collect the content from the homepages of news websites and to analyse it for its public affairs content. We are working on other functions for the tool in order to provide a richer picture of media diversity as well.
The dataset we are working with for the prototype version of the tool is the entire data scrape of the homepages of the top 20 news sources as identified by Roy Morgan Single Source News data. The time frame was across three months in 2019, and the Australian Federal Election was underway. We currently have four cross sections of the data: six-hourly, daily, weekly and monthly. The tool classifies the content into public affairs and non-public affairs. It provides a proportion of the total number of articles that are considered public affairs for each publication.
The SIH data scientists in consultation with project investigators have built a basic text classifier for Public Affairs (PA) vs Non-Public Affairs (NPA) classification of media articles. The current project has made iterative improvements on the performance of previous project classifiers in terms of robustness of a PA/NPA classifier to changes in time of publication. This means that results in the predictions of a PA/NPA classifier can be transferred to future articles. In addition, the project also evaluates inter-annotator reliability to gauge how well the notion of PA/NPA converges among experts in the field. Finally, the approaches were also tested on the Primary topic category to explore the potential for including Primary topic prediction to an operational news articles classification system.
Figure 1 below is a de-identified modelled representation of one of the outputs of the tool, allowing us to visualise relative public affairs content using one of the timeframes (e.g. daily, weekly) averaged across the three month span of the source data. Figure 2 shows the variation in public affairs content of all sources across an eight week period.
As noted above, we have designed this tool in order to provide one measure of media diversity in digital multi-platform news ecosystems.
The following screenshots illustrate the functionality of the Media Pluralism Dashboard tool in terms of the percentage of PA-NPA news article content across multiple single publishers; the percentage of PA-NPA comparisons between media publisher groups; and then, using a feature we are calling ‘Search analysis’ between publishers which enables us to zoom in on the articles themselves using specific search terms. We have not examined the broader significance of these analyses here – their purpose is indicative of the developing uses or the dashboard tool – but we certainly welcome any feedback (using the Contact Us page).
 See, for example: http://www.roymorgan.com/findings/7595-top-20-news-websites-march-2018-201805240521.
Public Affairs Content (PA) v Non Public Affairs (NPA) Distribution by % in Major Online News Media Publishers
Public Affairs Content (PA) v Non Public Affairs (NPA) Distribution by % in Major Online News Media Publisher Groups
Search analysis: Universities and Funding_News.com & SMH.com
Search analysis: Climate & Change & Morrison_News Corp & Nine Ent Titles