Net Alert

Alerts About Glossary
Also available in

Loading timeline. Please wait.

Harmonized Histories?
A year of fragmented censorship across Chinese live streaming applications

  • We reverse engineer three popular live streaming platforms (YY, Sina Show, and 9158) and find keyword lists used to censor chat messages.
  • Tracking changes to the keyword lists over the past year gives an inside look into how these applications implement censorship
  • Censorship is reactive often in response to current events.
  • The keyword lists between companies are not identical suggesting that there is no centralized list provided to them by authorities.
  • Censored keywords include references to collective action and government criticism serving as a counterpoint to recent studies.
  • Censored keywords include names of competitors, which appears to be motivated by business interests rather than government pressures.

Read the report Full keyword list

Table of Contents

  1. Introduction
  2. Background
  3. Law and Regulation
  4. Tracking Keyword Censorship
  5. Comparing Keywords Between Platforms
  6. Keyword Content Analysis
  7. Conclusions

This report is also available as a PDF download.
Download PDF


How is social media censored in China?
Our inside look finds overlapping motivations and uneven implementation of controls

Live streaming applications have gained huge popularity in China, with millions of users flocking to them to share karaoke performances, game sessions, and glimpses of their everyday lives. Popular streams attract hundreds of thousands of users who can chat with the live streamers and purchase virtual items to give them. The live streamers can in turn trade those items for cash. These platforms have given rise to a new generation of Internet celebrities who amass audiences, virtual gifts, product endorsements, and even venture capital investment from their video streams. In 2016, the popularity of live streaming in China exploded. The estimated value of the industry is $5 billion dollars. This growing popularity has been met with increased government pressure over prohibited content.

Live streaming applications use a combination of strict terms of service, content monitoring teams, and automated content filtering to comply with regulations. By reversing engineering three popular live streaming applications (YY, Sina Show, and 9158), we reveal one facet of these controls: uncovering how the apps automatically censor chat messages. From May 2015 to September 2016 we collected blacklisted keyword lists from each app. If a user enters a keyword from one of these lists in a chat, their message is censored.

Chinese netizens use the word “river crab” (“河蟹” héxiè) as a euphemism for censorship, because it sounds similar to the word “harmonious” (“和谐” héxié), which references former Chinese President Hu Jintao’s justification of censorship as necessary for the creation of a “harmonious society.” When online content is censored, it has been “harmonized.” We reveal how these companies reactively censor content often in response to current events, creating “harmonized histories.” However, the keyword lists we collected show that censorship is not uniformly implemented across companies, revealing subtleties in how social media is controlled.

Censorship of social media in China is operated through a system of intermediary liability in which companies are held responsible for content on their platforms. In China, intermediary liability is called “self discipline” and companies are expected to follow it to ensure a “harmonious and healthy Internet.” Self discipline is a means for the government to push responsibility for information control to the private sector. Through the keyword lists, we open a window into how live streaming companies have practiced self discipline over the past year.

We find limited overlap in unique keywords between companies suggesting that there is no centralized list of keywords provided to companies by authorities. While companies are under legal and regulatory pressures to manage and censor content, they appear to have a degree of flexibility in determining what specific keywords to block.

Events act as catalysts for censorship, with the applications quickly adding keywords in response to emerging news. Reporting on current events in China is strictly controlled and media groups are provided directives for how to present information. We do not see the same events censored between companies suggesting that there are no common directives provided to them. Companies may be receiving different sets of directives or have varying levels of compliance to them depending on where the companies are based and the provincial and municipal jurisdiction they directly answer to.

Counter to recent studies that suggest censorship of social media in China targets content related to collective action, but criticism of the government is allowed to persist, we find clear censorship of keywords related to collective action and government criticism. Censored keywords included names of competitors, which appears to be motivated by business interests rather than government pressure. Therefore, instead of a narrowly focused censorship agenda driven only by government intent, we observe a more complex set of overlapping motivations and uneven implementation of controls.

This system of self discipline creates a more distributed form of governance than traditional top down models of authoritarianism. Peering into how this system works shows the complex interaction between the state and companies, revealing that censorship of social media is not necessarily harmonious.

Background Back to top

Which companies did we study?
YY Inc. and Tian Ge are early adopters in the live streaming industry

YY Inc. and Tian Ge Interactive Holdings Limited are the early leaders in the live streaming industry.

YY launched its core live streaming product in 2008, and has benefited from the recent surge in live streaming popularity, with its stock rising 64% between July to September 2016. YY Inc. is traded on the Nasdaq.

Screenshot of the YY interface showing a range of features including live streaming video, chat, and virtual item trading
Figure 1: This screenshot of the YY interface shows a range of features including live streaming video, chat, and virtual item trading. Source: YY Investor Presentation

Tian Ge launched 9158 in 2008. In 2010, Sina Corporation invested 10 million dollars (representing a 25% stake) in Tian Ge and provided it with a sole license for the operation of Sina Show. Tian Ge is traded on the Hong Kong Stock Exchange.

Company YY Inc Tian Ge
Public Listing Nasdaq SEHK
Product YY 9158, Sina Show
Registered Users 861.4 mn 295.5 mn *
Monthly Active Users 117 mn 16 mn *

Table 1: Company breakdown
* Tian Ge reports users numbers as aggregate of its products
Source: Tian Ge Corporate Presentation, YY Inc Corporate Presentation

All Internet companies operating in China are held liable for content on their platforms and are expected to invest in staff and technology for ensuring compliance with government regulations. Failure to comply with regulations can lead to fines or revocation of operating licenses. This model of governance effectively pushes responsibility and liability for censorship down to the private sector. While punishments are clear, the regulations include vague definitions of prohibited topics such as "propagating heretical or superstitious ideas," “spreading rumors,” and “disrupting social order and stability.” Penalties are also sometimes applied in unpredictable ways motivated by reactive government campaigns. This environment pushes companies and users to both over-censor and self-censor—a phenomenon Perry Link described as the anaconda in the chandelier.

In their public filings, YY Inc. and Tian Ge describe business risks posed by failure to follow Chinese goverment content regulations and their efforts to comply with them through a combination of terms of service, manual content review teams, and automated content monitoring and filtering. Terms of service are particularly strict and list a wide range of prohibited activities for live streamers and audiences. For example, YY prohibits users from "producing, copying, publicizing, disseminating, or saving" content that might “sabotage national security,” “incite subversion of state power”, “threaten social stability,” “promote cults and superstition,” or is against any laws and regulations. These terms push responsibility and liability to the users in a manner similar to how the government pushes responsibility down to the private sector.

YY and Tian Ge describe teams that maintain "24-hour surveillance" on content and conduct random checks on video chat rooms. Supporting these efforts are keyword filtering systems to monitor and manage prohibited content as well as audio and video processing and monitoring capabilities, according to financial disclosure filings.

Throughout 2016, increased government regulation on the live streaming industry highlighted the balancing act these companies perform, attempting to grow their business and leverage new innovations while keeping within regulatory boundaries.

Law and Regulation Back to top

How has the Chinese government reacted?
Over the past year, increased regulatory pressures have been put on the live streaming industry in China.

In 2016, live streaming applications came under new pressures to ensure real name registration of live streaming performers and censorship of prohibited content.

On April 13, 2016, more than 20 companies, including Baidu, Sina, Youku, iQiyi, and 6 Room signed self-disciplinary agreements for content regulation. The agreements, collectively called the Beijing Internet Live-Broadcast Industry Self-Regulation Convention, require user-generated content be stored for at least 15 days and all live streaming users be certified through real-name registration. Users who stream content on "politics, guns, drug, violence, or pornography" will be blacklisted and banned from streaming on any live streaming application. In the first month after the Convention was published, forty live streaming users were blacklisted for allegedly streaming pornographic content. Companies that have signed the agreements must have the capability to conduct 24 hour monitoring of live streamed content and employ a combination of “manual” and “technology-based” inspection.

On April 14, 2016, China’s Ministry of Culture announced that a number of companies including Douyu, Panda TV, Huya, YY, Zhanqi TV, and 9158, were under investigation for hosting content that was too vulgar, sexual, violent, or incited users to commit crimes.

In July 2016, the Internet Crime Reporting Centre of China’s Ministry of Public Security (MPS) announced a campaign to determine if companies have followed and enforced new content management regulations. Specifically, the campaign focused on removing "illegal and harmful information" from these platforms, shutting down accounts or channels that disseminate illegal content, and punishing companies that break the regulations. The main targets of the campaigns were applications that are frequently reported by the public and Internet users; have allegedly been involved in providing sexual, gambling, and other illegal content; or fail to systematically enforce regulations.

The announcement also noted that MPS would step up the real name registration of live streamers and phone number registration of general users. In late September 2016, Chinese authorities announced that telecommunications service providers are required to verify all their phone users’ identity by end of 2016. Authorities would also collaborate with companies to curb the dissemination of "pornographic, violent, horrifying, crime-inciting and other illegal information" or any sexual, gambling, and spam related activities. The campaign is part of China’s overall “Clean the Web” effort, in which Chinese companies make pledges to target harmful content on their platforms. This effort includes national level directives and actions taken at the municipal level that can include tighter restrictions.

For example, according to the Beijing Municipal Internet Law Enforcement Team there are three criteria used to determine if a platform is liable for content infringement:

(1) Sexual content is streamed for more than three minutes without action from the platform;

(2) A short video with sexual content (the duration is not specified, but presumably under three minutes) is on the platform for consecutive days;

(3) Live streams involving illegal content are announced in advance or promoted afterwards and the streaming user’s account is not banned by the platform.

Companies that do not comply with these criteria may face fines. Companies found to be in serious violation will be ordered to suspend business and face rescission of licenses and permits.

In Shanghai, companies are required to take action on illegal content after a notification within one minute. By the end of the ICRC’s campaign, the Shanghai Public Security Bureau announced that over 1,000 live streamer accounts had been shut down and approximately 450,000 live streamers had verified their identities.

In early September 2016, China’s State Administration of Press, Publication, Radio, Film and Television (SARFT) released a "Notice On Issues Concerning Strengthening Management Over Internet Audio-Visual Live-Broadcast Services" that requires companies to obtain a license issued by SARFT to continue providing live streaming services. Prior to the notice, companies engaging in live streaming only required an Internet Culture Business Permit. Requirements for obtaining the new license include being state owned or state controlled, and having at least 10 million yuan in registered capital. Companies also need to invest in automated monitoring systems. This tightened regulation has led to concerns that startups and smaller companies may not be able to enter the market.

In November 2016, the Cyberspace Administration of China formalized rules on the live streaming industry by grouping them under a 20 point regulation that will be put into effect on December 1 2016. The regulations affirm requirements for real-name registration, content monitoring and filtering, capacity to cut off live-streams, and measures for blacklisting users. The regulations further state that user content must be retained for 60 days and live streams that provide news content must "obtain internet news information service credentials." News streams are also required to clearly indicate sources of news information and a chief editor must be established by companies that have news streams. The regulations state that companies must have means to review live streams with news content prior to publication. The prior review for news related streams will limit the spontaneity of live streaming and demonstrates the level of control the government is pushing companies to maintain.

Tracking Keyword Censorship Back to top

How did we discover censorship?
We reverse engineered the apps, found lists of blacklisted keywords, and tracked updates to them over time.

Through reverse engineering we found that the automated keyword monitoring and filtering systems described by YY and Tian Ge in their public filings are implemented on the client-side (i.e., in the application itself) rather than on the server-side (i.e., on a remote server).

In a client-side implementation, all of the rules to perform censorship are inside of the application running on your device. Often the application has a built-in list of keywords that it uses to perform checks to determine if any of these keywords are present in your chat messages before your messages are sent. If your message contains a keyword from the list then the message is not sent. The applications that we analyze in this report, download an updated keyword list every time you run the software. Because censorship in these applications is implemented client-side, we were able to reverse engineer them and learn how they download keyword lists, and for those keyword lists that were encrypted, how they decrypt the keyword lists. We can then download the keyword lists for ourselves, and, after performing any necessary decryption, track updates to these lists over time.

In our previous paper we found that some YY lists also trigger client-side surveillance. When a censored keyword is entered, a message is sent via HTTP GET request to a server that includes: the username of who sent the message, the username of who received the message, the keyword that triggered censorship, and the entire triggering message. All of the keywords in our updated collection would trigger this logging message. It is possible that this information is used to penalize users who break terms of service, but we cannot confirm this speculation.

We began performing hourly data collection of keyword lists between February and March 2015. The data collection start dates for each application are as follows: YY Feb 7, 2015; 9158 Feb 24, 2015; Sina Show March 11, 2015.

In an earlier paper we reported results from data collected between February 2015 to May 17 2015. In total, we collected 17,547 unique keywords during that period.

In this report, we analyze data collected from May 18, 2015 to September 30, 2016. Over this collection period we collected a total of 2,044 additional keywords. Table 2 provides a breakdown of unique keywords added by each application since our first paper, excluding any keywords we had already seen on that platform.

Platform Keywords Added
YY 1468
Sina Show 266*
9158 310

Table 2: Keywords Added by Platform
*The total keywords for Sina Show is 1,239 when including strings of numbers that we suspect are primarily phone numbers. We excluded these strings from our analysis.

While YY added the most keywords, Sina Show had more frequent updates to keyword lists (158 updates) compared to YY (138 updates) and 9158 (33 updates). Figure 2 shows the distribution of updates over the collection period by each application. Sina Show changed the URL of their keyword list download in an updated version the client, and as a consequence we have no data for the first three months of Sina Show updates until we discovered the new URL.

Distribution of keyword updates between May 18 2015 - September 30 2016 (GMT Time)
Figure 2: Distribution of keyword updates between May 18 2015 - September 30 2016 (GMT Time)

Comparing Keywords Between PlatformsBack to top

Do the platforms censor the same keywords?
The blacklisted keyword lists vary between companies.

In addition to the live streaming applications we studied, previous work has identified client-side keyword censorship on chat apps used in China including TOM-Skype, Sina UC, and LINE. Comparing the unique keywords between these applications shows very limited overlap, suggesting that there is no centralized list of keywords provided to companies by authorities, which gives the companies a degree of flexibility in deciding what content to target and how to implement censorship. Previous studies of blog services and search engines in China have also found inconsistencies in how censorship is implemented between companies.

Analyzing similarity in unique keywords between the lists extracted from YY, Sina Show, and 9158 between May 2015 and September 2016 reveals limited overlap between the companies (YY and Tian Ge), but does show commonalities between Sina Show and 9158.

In Table 3, we compare the additions made to the keyword lists according to their Jaccard similarity coefficient (i.e., the size of the intersection of two sets divided by the size of their union). The results are very little to no overlap in keyword list between YY and 9158, and YY and Sina Show, with some commonalities between 9158 and Sina Show.

9158 versus Sina Show YY versus 9158 YY versus Sina Show
17.28% 1.21% 0%

Table 3: Additions to keyword lists compared by Jaccard similarity

In Table 4, we again compare additions to the keyword lists except this time according to a different similarity metric. We compute list x’s similarity to y as max (% of x in y, % of y in x). The intuition behind this metric is that it would tease out lists that inherit from other lists. This metric further confirms our previous result showing greater overlap between 9158 and Sina Show, and limited overlap to YY.

The overlap between Sina Show and 9158 is not surprising since the platforms are owned and operated by the same company. Despite this common ownership and degree of similarity the keyword lists and timing of list updates are not identical, which suggests Tian Ge does not manage content on the platforms in completely the same way.

9158 versus Sina Show YY versus 9158 YY versus Sina Show
41.56% 4.02% 0%

Table 4: Additions to keyword lists compared by
similarity(x, y) = max(% of x in y, % of y in x)

Next, we use the same similarity metrics to compare the live streaming keyword lists to lists from chat apps extracted in previous research. Clustering the lists by Jaccard similarity we again find very little similarity between lists, and when lists are similar they are lists within the same company, as shown in the heat map in Figure 3.

Keyword lists clustered by Jaccard similarity
Figure 3: Keyword lists clustered by Jaccard similarity

In Figure 4, we compute the similarity of x to y as max (% of x in y, % of y in x), and see more lists similar to each other within companies, lists from different companies remain mostly dissimilar with one exception: GuaGua, a live streaming app we investigated in previous work that has a built-in list of keywords that it never updates, is similar to many Sina Show lists. Closer inspection reveals that the GuaGua list is a near exact duplicate of a 2004-era list built into Sina UC that Sina Show’s built-in lists incorporate. The only difference in the GuaGua list is the addition of a single keyword. Both of the founders of GuaGua formerly worked on audio chat software at Langma UC (acquired by Sina Corporation in 2004 to become Sina UC) and Sina Corporation. This employment history may explain why the GuaGua and Sina lists are so similar.

Figure 4: Keyword lists clustered by
similarity(x, y) = max(% of x in y, % of y in x)

Keyword Content AnalysisBack to top

What kind of content is censored?
Censored content includes references to events, politics, people, social issues, and technology.

We used a combination of machine and human translation to translate the keywords to English and analyzed the context behind each one. Based on interpreting these translations with contextual information, we coded each keyword into content categories grouped under six general themes according to a code book we developed in previous work (see Table 5).

Theme Example Categories
Event Scheduled events, recurring events, current events
Political Communist Party of China, religious movements, ethnic groups
People Government officials, dissidents
Social Gambling, illicit goods and services, prurient interests
Technology General technical terms, URLs, applications and services
Misc Keywords with no clear context that cannot be classified under other themes

Table 5: Content Themes and Related Categories

Figure 5 shows the distribution of themes across the three applications (normalized by total number of keywords on each app). In the following sections, we examine each theme in detail.

Figure 5: Distribution of themes across the three SVPs
Figure 5: Distribution of themes across the the three applications


The Social theme is divided into three categories: gambling (e.g., online casinos), illicit goods and services (e.g., narcotics, weapons, counterfeit products), and prurient interests (e.g., sexuality, pornography, prostitution). Figure 6 shows the percentage of Social theme keywords by category (normalized by the total number of social keywords in each app).

The Social theme accounts for the highest percentage of keywords on each application relative to other themes (Sina Show: 59%, 9158: 50%, YY: 44%). The focus on this theme may reflect a reaction of the companies to the new regulatory campaigns that specifically target pornography, drugs, and weapons.

Figure 6: Percentage of Social theme keywords by category
Figure 6: Percentage of Social theme keywords by category


The Event theme includes reference to 20 distinct events. We correlate the timing of keyword list updates to events that happened within our collecton period, and find reactive censorship driven by current events.

Reporting on current events in China is tightly controlled by government authorities. Media organizations are routinely provided directives on how to report the news. China Digital Times, an independent media group, occasionally publishes leaked directives sent to Chinese news organizations, which provide a glimpse into how this system works. There have also been leaks from social media companies, such as Sina Weibo, which describe censorship instructions from company managers that purportedly correspond to state directives. However, it is unclear in what form or at what frequency directives are provided and if companies receive the same ones.

YY has the largest number of event related keywords (632 keywords) compared to 9158 (33 keywords) and Sina Show (31 keywords). YY also referenced more unique events (15) than 9158 (8) or Sina Show (8) . We found similar results in our previous collection period where YY also referenced the most events and included the highest number of Event keywords.

We compare unique keywords (that have not previously appeared on the keyword lists) across the platforms by Jaccard similarity and our similarity metric (see Table 6). Our results show no overlap in event keywords referenced between YY and Tian Ge operated applications, which suggests there are either no common directives provided to these companies or there is varying compliance with directives. However, we do see close similarity between Sina Show and 9158 Event keywords.

9158 versus Sina Show YY versus 9158 YY versus Sina Show
Jaccard similarity 73.52 0% 0%
max(% of x in y, % of y in x) 92.59% 0% 0%

Table 6: Additions to Event keyword lists compared by Jaccard similarity and similarity(x, y) = max(% of x in y, % of y in x)

Only three events are referenced by all applications (June 4 1989 Tiananmen Square Massacre, the sentencing of Zhou Yongkang, and the Hague Verdict on the South China Sea arbitration).

Sina Show and 9158 reference the same 7 events. The only difference between them is 9158 references the Tianjin Explosion and Sina Show references the Cultural Revolution. Event updates on the two applications are often made within the same period and sometimes on the same day. The close similarities between these applications can explained by common ownership. However, the lack of complete overlap in event-related keywords between the platforms shows they still do not share an identical list.

Figure 7: Percentage of Event theme keywords by category
Figure 7: Percentage of Event theme keywords by category

Below we examine the three events that each application referenced.

The June 4, 1989 Tiananmen Square Massacre remains one of the most taboo events in China. Reactive censorship on social media in China often accompanies the anniversary, and the Chinese government continues to push revisionist narratives of what happened.

Between late May and the first week of June 2015, leading up to the 27th anniversary of the Tiananmen Square Massacre, YY added 525 keywords related to the event. Comparatively, 9158 and Sina Show each added 3 keywords on dates that did not fall close to the anniversary.

In our previous data collection period, YY keyword lists also had a heavy focus on June 4, accounting for over 90% of YY’s event keywords and 32% of YY’s lists overall. June 4 related keywords on YY’s lists include a number of ways to refer to the event including numerals ("89VIIV"); homonyms (陆4, “Land 4,” the character (陆 Lù) sounds similar to six (六 Liù) in Chinese); locations of annual memorial events (维园烛光, “Victoria Park Candle”); and references to recent discussion of the event such as “Trump June 4” (川普六四), which is likely related to Donald Trump referring to Tiananmen Square as a “riot” in an election debate.

On June 11 2015, Zhou Yongkang, who was once one of China's most powerful political figures, was sentenced to life in prison on corruption charges. On June 11, YY added 23 keywords related to the sentencing (e.g., 無期徒刑 "life imprisonment"). Prior to the date of the sentencing, Sina Show and 9158 added references to associates of Zhou who were also implicated in his corruption case including former PLA general Xu Caihou (徐才厚) and former Party official Ling Jihua (令计划).

In a case known as the South China Sea Arbitration, the Philippines under provisions of the United Nations Conventions on the Law of the Sea brought complaints against China over territorial claims in the South China Sea.

On July 12, 2016, an international tribunal in the Hague ruled in favour of the Philippines and concluded that China has no legal basis to claim historical rights in the South China Sea. China rejected the ruling. On the same day, Sina Show added two keywords (南海仲裁 "South China Sea Arbitration", 海牙 “Hague”) and 9158 added one (南海仲裁 “South China Sea Arbitration”). On July 13, YY added two keywords, one referencing China’s rejection of the ruling (习总的拒绝 “President Xi's rejection”) and another related to a fake news story that went viral on Chinese social media following the verdict, which claimed China and the Philippines had declared war on each other and the Chinese army successfully wiped out a unit of the Philippine Air Force (全歼菲方空军 “Wipe out the Philippine Air Force”).

YY keyword lists include reference to 11 events that do not appear on the other applications. Some of these events are clearly sensitive topics for the government shown by leaked directives.

Wukan is a fishing village in southern Guangdong that has earned renown for activism. In 2016, villagers took to the streets calling for the release of detained democratically-elected local leader Lin Zulian and the resolution of a long-simmering dispute over land sales. China Digital Times published a leaked directive that was issued to news organizations on June 21, 2016, (China Digital Times does not disclose the issuing bodies to protect the sources of the leaks):

"Regarding former village committee chief of Wukan, Guangdong, Lin Zuluan being investigated and admitting his guilt, websites are strictly prohibited from releasing or re-publishing any news, photos, video, or information related to the mass incident in the village"

On June 22, YY added one keyword (林祖銮 "Lin Zulian") followed by the addition of two keywords on June 23 (林祖戀 “Lin Zulian”, 還我書記 “Return our secretary”). It is unclear if YY received similar directives for hanlding the Wukan protests. While it is plausible, the lack of any Wukan related keywords on Sina Show or 9158 suggests distribution of these directives or compliance to them varies.

We observe a similar pattern in the censorship of President Xi’s gaffe during his opening speech at the 2016 G20 summit in Hangzhou. During the September 4, 2016 speech Xi mistakenly said "reduce taxes and make roads easy [to travel on], facilitate commerce and loosen clothing" (轻关易道通商宽衣), when he should have read “reduce taxes and make roads easy [to travel on], facilitate commerce and be lenient to farmers” (轻关易道通商宽农).

This slip of the tongue was clearly embarrassing for Xi. China Digital Times published a September 4 leaked directive that instructed online media to "filter and intercept content" related to "tongshang kuannong [通商宽农]," and strictly delete comments, photos, videos, and related information”. On September 5, YY added 17 keywords including “Xi undress” (習寬衣), “loosen the clothing and undo the belt” (寬衣解帶), and other references to the speech.

Events like the Wukan protest and G20 speech gaffe are clearly sensitive to Chinese authorities, and it is surprising to see them only referenced on one application. Other keywords found on YY are related to sensitive events specific to the application. In September 2015, YY added 6 keywords referencing an August 2015 incident during which a YY user apparently forgot to turn off her webcam and had sex with her partner while live streaming (yy出事视频 “yy accident video”, 忘关视频被啪 “forgot to turn off the video while having sex”). Videos of the incident circulated on Chinese social media causing a scandal. In this case, it is obvious that YY would be motivated to attempt damage control over the incident as it brings unwanted attention from authorities.

Overall, we find that censorship of events is dynamic and reactive and in some cases can be correlated to directives sent to media organizations by government propaganda offices. However, we observe a lack of overlap in the unique events censored by different companies suggesting that there are no centralized directives given to the companies or differing levels of compliance. YY is registered in Guangzhou whereas Tian Ge is registered in Hangzhou. Each company has to follow respective municipal and provincial regulations. The companies may therefore be given different directives based on the location of their registration, which other studies have suggested may account for variance in how censorship is implemented. These results demonstrate that events are catalysts for censorship but the ways in which they are managed is not uniform.


The Political theme includes 18 categories related to issues including the Communist Party of China (CPC), ethnic minority groups in China, religious movements, and terrorism.

Figure 8: Percentage of Political theme keywords by category
Figure 8: Percentage of Political theme keywords by category

All three applications have keywords related to the CPC. This content includes general references to the structure of the party and its various departments (e.g., 中央政治局 "politburo", 中共中央 “CPC Central Committee”); allusions to factional struggles within the party (e.g,习近平阵营和江派 “Xi and Jiang faction camp”); and pejoratives (e.g., 共匪 “Communist bandits”).

Keywords related to the Uyghur ethnic minority are also present on all of the applications. These keywords appear in Chinese and in the the Uyghur language in both Arabic and Latin script. The content of the keywords range from religion (东突穆斯林 “East Turkestan Muslim”), violence (partila “explode”), to separatism ( تۈركىستان ئىسلام پار تىيىسى “Turkestan Islamic Party” – an Islamic separatist organization founded by Uyghur militants). Other keywords are more cryptic without clear context such as “cloudy weather” (“بۇلۇتلۇق ھاۋا”), and “sweet potato (“ياڭيۇ تاتلىق”). In our previous data collection period, Uyghur keywords were also present on all three applications and represented the largest percentage of keywords within the political theme for Sina Show (45%) and YY (25%).

Titles of books dealing with sensitive topics that have been banned in China also appear on each application. These books, predominantly published in Hong Kong and Taiwan include discussions of power struggles within the CPC (e.g., 老江气杀习大, “Old Jiang Enrages Uncle Xi”), and fiction critical of communist rule (e.g., 黄祸, “Yellow Peril” written by Wang Lixiong). China has strict regulations on the publishing industry, pushing dissident and tabloid authors to Hong Kong and Taiwan to publish on sensitive topics. The sale of banned books was highlighted in 2015 after five employees of a book shop and publishing firm in Hong Kong specializing in taboo titles went missing, only to later emerge in custody in mainland China. Their disappearances had a chilling effect on publishers in Hong Kong who pulled sensitive titles from their shelves. One of the booksellers, Lam Wing-kee, revealed details of his detention at a press conference in Hong Kong on June 16, 2016. His name (林荣基) is included in the keywords lists on YY under the event theme.


The technology theme has five categories including censorship circumvention tools, URLs, hardware devices, Chinese software and websites, and phone numbers.

Figure 9: Percentage of Technology theme keywords by category
Figure 9: Percentage of Technology theme keywords by category

The hardware category includes 25 references to drones and other unmanned aircraft (e.g, 四旋翼无人机 "quadcopter"; ). While it is unclear why these keywords are censored, there is rising concern in China regarding safety, privacy, national security issues and increasing regulations on drone technology.

In the "Chinese websites and software" category we see instances of what may be the companies using censorship to gain competitive advantage. YY lists include 25 keywords that reference competing live streaming services in China (e.g., 美拍直播, “Mei Pai Live,” 熊猫TV, “Panda TV”), and 9158 includes two keywords (e.g., 六间房 “Six Room”), We found similar content in our previous collection period with all three applications adding names and URLs of competitors. The addition of these keywords may be attempts to prevent users from being lured away from the provider’s platform.


The People theme includes two categories: names of CPC officials and names of dissidents.

Figure 10: Percentage of People theme keywords by category
Figure 10: Percentage of People theme keywords by category

References to dissidents include the renowned artist Ai WeiWei (艾未未), Chinese human rights lawyer Guo Feixiong (郭飛雄), and gender activist Ye Haiyan (referred to by her nickname “rogue yan” 流氓燕).

References to officials includes current and former leaders (e.g., 李克强 “Li Keqiang” current Premier of the State Council of the PRC, 胡锦涛 “Hu Jintao” former Chinese President).

There are also numerous examples of playful, derogatory, and creative ways to refer to party leaders in the keyword lists. Keywords related to President Xi Jinping include an endearing nickname, “Daddy Xi” or “Uncle Xi” (习大大), which has been used in state propaganda, but recently has been reportedly banned from official use to tone down Xi’s populist image. Other nicknames are more derogatory such as “Bun Ruthless” (包子心狠手辣). The word steamed bun (包子) is used to refer to Xi following the circulation of a photo showing him ordering lunch at a steamed bun shop that was subsequently criticized as a political show. Whereas “ruthless” (心狠手辣) criticizes Xi’s hardline rule over China. Chinese netizens often make creative use of the Chinese language in efforts to evade censorship. We see examples of this practice in reference to Xi by reversing the order of the characters in his name (平近习), and using homoglyphs (刁近乎, diāo jín hū) that appear similar to his characters (习近平, xí jìn píng).

Researchers have argued that automated keyword censorship is ineffective, because through creative use of language users find means to circumvent the filters. The keyword lists we collected show that censors are clearly picking up on these practices, engaged in a cat and mouse game between users. The censors will never be able to comprehensively censor speech through keyword filtering, nor will users always be able to evade these controls.

Conclusions Back to top

What's the takeaway?
Censorship of social media in China is not monolithic and involves a distributed and complex interaction between government and companies.

Recently, a theory for explaining why certain content is targeted for censorship on Chinese social media, and the underlying motivations has gained attention. King et al. collected posts from Chinese social media websites, and through statistical analysis comparing censored and uncensored posts, contend that censorship focused on content that represented, reinforced, or encouraged collective action, while content critical of the government is often allowed to persist. Based on this finding, King et al. conclude that the intent of social media censorship in China is to disrupt ongoing or emerging collective action activities. Such theories present a centralized censorship program with clear intent that creates a uniform outcome across companies and platforms. Our research unearths a more complex reality.

Studies of chat apps, blogs, search engines and live streaming apps in China, have consistently found variance in how companies implement censorship. Our findings show clear evidence that there is no central list of banned keywords provided to companies. Companies may be receiving general directives on prohibited content. But these directives may differ based on where the companies are registered, and may only provide general instruction leaving companies to determine the specific content of keyword lists themselves.

We also find a wide range of content that acts as a counterpoint to the collective action theory posed by King et al. We find keywords with clear references to collective action, such as protests, religious movements, and separatist causes. However, in addition to collective action references, there are numerous examples of government criticism from general descriptions of government bodies, rumors of factionalism, corruption, derogatory references to leaders, and officials caught in embarrassing gaffes. Therefore, while King et al.. observed tolerance of government criticism on the platforms they studied, our analysis conclusively shows that live streaming apps specifically target this kind of speech. Furthermore, we find the companies censoring content seemingly motivated by their own business interests, as evidenced by censorship of competing products. Censorship may therefore not always be motivated solely by government pressures or political issues.

The popularity of live streaming shows how Chinese users are embracing new forms of expression, sharing, and commerce. The increase in regulation during 2016 demonstrates that this popularity is a concern for government. The regulatory pressures are pushed down to companies, forcing them to react and practice self-discipline. However, this control is not achieved through traditional centralized governance, but rather through a form of “Networked Authoritarianism” in which a ruling party maintains control over the Internet, though in a more distributed and adaptive manner than classic authoritarianism. The blacklisted keyword lists give us a behind the scenes look into these efforts to create a “harmonious Internet” in China revaling the multifacated reality of how social media is controlled.

Acknowledgments Back to top

Project team: Masashi Crete-Nishihata, Andrew Hilts, Jeffrey Knockel, Jason Q. Ng, Lotus Ruan, Greg Wiseman.

We are grateful to Shazeda Ahmed, Adam Senft, and Uyghur Human Rights Project for research assistance, and Ron Deibert and Jedidiah Crandall for supervision. Special thanks to Lokman Tsui for inspiration.

This material is based upon work supported by the U.S. National Science Foundation under Grant Nos. #1314297, #1420716, #1518523, and #1518878. Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation.

Alert your friends!