Should Internet Firms Pay for the Data Users Currently Give Away?网络公司应为用户产出的数据付费

时间：2024-05-07

刘莉

And， as a new paper proposes， should the data-providers unionise？還有一篇新论文提议，数据提供者是否应当成立工会？

You have multiple jobs， whether you know it or not. Most begin first thing in the morning， when you pick up your phone and begin generating the data that make up Silicon Valley’s most important resource. That， at least， is how we ought to think about the role of data-creation in the economy， according to a fascinating new economics paper. We are all digital labourers， helping make possible the fortunes generated by firms like Google and Facebook， the authors argue. If the economy is to function properly in the future—and if a crisis of technological unemployment is to be avoided—we must take account of this， and change the relationship between big internet companies and their users.

Artificial intelligence （AI） is getting better all the time， and stands poised to transform a host of industries， say the authors （Imanol Arrieta Ibarra and Diego Jiménez Hernández， of Stanford University， Leonard Goff， of Columbia University， and Jaron Lanier and Glen Weyl， of Microsoft）. But， in order to learn to drive a car or recognise a face， the algorithms that make clever machines tick must usually be trained on massive amounts of data. Internet firms gather these data from users every time they click on a Google search result， say， or issue a command to Alexa. They also hoover up valuable data from users through the use of tools like reCAPTCHA， which ask visitors to solve problems that are easy for humans but hard for AIs， such as deciphering text from books that machines are unable to parse. That does not just screen out malicious bots， but also helps digitise books. People “pay” for useful free services by providing firms with the data they crave.

These data become part of the firms’ capital， and， as such， a fearsome source of competitive advantage. Would-be startups that might challenge internet giants cannot train their AIs without access to the data only those giants possess. Their best hope is often to be acquired by those very same titans， adding to the problem of uncompetitive markets.

That， for now， AI’s contributions to productivity growth are small， the authors say， is partly because of the free-data model， which limits the quality of data gathered. Firms trying to develop useful applications for AI must hope that the data they have are sufficient， or come up with ways to coax users into providing them with better information at no cost. For example， they must pester random people—like those blur-deciphering visitors to websites—into labelling data， and hope that in their annoyance and haste they do not make mistakes.

Even so， as AI improves， the amount of work made vulnerable to displacement by technology grows， and ever more of the value generated in the economy accrues to profitable firms rather than workers. As the authors point out， the share of GDP paid out to workers in wages and salaries—once thought to be relatively stable—has already been declining over the past few decades.

To tackle these problems， they have a radical proposal. Rather than being regarded as capital， data should be treated as labour—and， more specifically， regarded as the property of those who generate such information， unless they agree to provide it to firms in exchange for payment. In such a world， user data might be sold multiple times， to multiple firms， reducing the extent to which data sets serve as barriers to entry. Payments to users for their data would help spread the wealth generated by AI. Firms could also potentially generate better data by paying. Rather than guess what a person is up to as they wander around a shopping centre， for example， firms could ask individuals to share information on which shops were visited and which items were viewed， in exchange for payment. Perhaps most ambitiously， the authors muse that data labour could come to be seen as useful work， conferring the same sort of dignity as paid employment： a desirable side-effect in a possible future of mass automation.

The authors’ ideas need fleshing out; their paper， thought-provoking though it is， runs to only five pages. Parts of the envisioned scheme seem impractical. Would people really be interested in taking the time to describe their morning routine or office habits without a substantial monetary inducement （and would their data be valuable enough for firms to pay a substantial amount）？ Might not such systems attract data mercenaries， spamming firms with useless junk data simply to make a quick buck？

Nothing to use but your brains

Still， the paper contains essential insights which should frame discussion of data’s role in the economy. One concerns the imbalance of power in the market for data. That stems partly from concentration among big internet firms. But it is also because， though data may be extremely valuable in aggregate， an individual’s personal data typically are not. For one Facebook user to threaten to deprive Facebook of his data is no threat at all. So effective negotiation with internet firms might require collective action： and the formation， perhaps， of a “data-labour union”.

This might have drawbacks. A union might demand too much in compensation for data， for example， impairing the development of useful AIs. It might make all user data freely available and extract compensation by demanding a share of firms’ profits; that would rule out the pay-for-data labour model the authors see as vital to improving data quality. Still， a data union holds potential as a way of solidifying worker power at a time when conventional unions struggle to remain relevant.

Most important， the authors’ proposal puts front and centre the collective nature of value in an AI world. Each person becomes something like an oil well， pumping out the fuel that makes the digital economy run. Both fairness and efficiency demand that the distribution of income generated by that fuel should be shared more evenly， according to our contributions. The tricky part is working out how.

不论你知道与否，其实你正身兼数职。大多数人早晨就开工了——你拿起手机开始产生数据，构成了硅谷最重要的信息来源。一篇引人入胜的经济学新论文提出，我们至少应当从这个角度去思考数据创造在经济学当中的角色。作者们认为，我们所有人都是数字劳工，为谷歌、脸书之类的公司制造财富。要想让未来的经济正常运转，要想避免技术带来的失业危机，我们就必须考虑到这一点，改变大型互联网公司与其用户的关系。

人工智能（AI）日新月异，时刻准备着让一系列行业转型换代，论文的作者们（来自斯坦福大学的伊马诺尔·阿列塔·伊瓦拉与迭戈·希门尼斯·埃尔南德斯，来自哥伦比亚大学的伦纳德·戈夫，来自微软公司的雅龙·拉尼尔与格伦·韦尔）表示。不过，为了学习汽车驾驶和人脸识别，智慧机器所用的算法通常需要先在海量数据中训练运行。互联网公司的数据，来源于用户对谷歌搜索的每一次点击、对亚马逊语音助手Alexa发出的每一条指令。他们还会使用reCAPTCHA之类的工具，从用户身上抓取有价值的数据——该工具要求访客去解决对人类很容易但AI却难以胜任的问题，例如对书中的文本进行句法分析。这样做不仅能筛除恶意自动程序，还能将纸本图书电子化。人们向互联网公司提供他们渴求的数据，从而为免费又好用的服务“买单”。

这些数据不但成为了互联网公司的资本，更可以带来惊人的竞争优势。跃跃欲试的创业公司也许会向互联网巨头发起挑战，但却必须借助巨头手中的数据才能训练自家AI。他们最好的结局往往是被巨头同行收购，让竞争本就不够充分的市场雪上加霜。

论文作者们认为，目前AI对生产力增长的贡献不大，部分原因在于免费数据模式限制了数据采集的质量。若要开发实用的AI应用，互联网公司必须寄希望于充足的数据，或者想办法诱导用户无偿向其提供更优质的信息。例如，他们必须缠着随机人群去给数据贴标签，比如那些要识别模糊验证码的访客，而且还要希望他们在烦扰和匆忙中不出错。

即便如此，随着AI的改进，越来越多的工作会因技术进步而被取代，所产生的经济价值也会更多地落入赢利公司而非工人手中。作者们指出，薪水支出所占的GDP份额曾被认为是相对稳定的，但过去几十年间却每况愈下。

为了应对这些问题，他们提出了一种激进的方案。数据不应该被当作资本看待，而应当作为劳动成果——具体来讲就是信息产生者的财产，除非他们同意向公司提供数据以换取报酬。如此一来，用户数据可能会多次兜售给多家公司，从而降低数据作为准入门槛的高度。向提供数据的用户支付报酬，有利于将AI制造的财富分配开来，也让互联网公司有望获得更好的数据。举个例子，与其猜测商场里的顾客想要什么，不如请求人们分享自己的信息以换取报酬，告诉互联网公司他们到访了什么店铺、浏览了哪些物品。那些论文作者们最大胆的想法也许是，数据劳动可能会渐渐被视作一项有用的工作，像带薪职务一样赋予人们尊严——未来兴许会出现的大规模自动化便带有这种令人期待的副作用。

这些作者的想法虽然发人深省，但只有区区五页篇幅，还需详加阐述。他们设想的这个体系里，有些部分似乎不切实际。如果没有可观的酬金，人们是否真有兴趣花时间描述自己每天早上的起居或办公室里的习惯（他们的数据又是否真那么宝贵，值得互联网公司大掏腰包）？这些体系会不会引来一众数据雇佣兵，为了挣快钱而拿没用的垃圾数据敷衍交差？

除了大脑别无可用

当然，这篇论文仍然具有一些重要洞见，给探讨数据在经济活动中扮演的角色拟订了框架。其中一个角色，便牵涉数据市场中权力的失衡。大型互联网公司的集中性是一方面，还有一个原因则是，尽管数据总体的价值极高，个体提供的单一數据一般却无足轻重。就算某位用户拒不提供他的数据，也不会对脸书构成任何威胁。因此，要与互联网公司进行有效磋商，可能需要采取集体行动：也许，还需要成立一个“数据工会”。

这样做也许有其弊端。比如说，工会也许会开出过高的数据价码，令实用AI的开发受阻。工会也许会要求互联网公司以利润分成来换取免费使用所有数据的权利。这就与论文作者们主张的数据付费劳动模型背道而驰了。他们认为该模型对提高数据质量至关重要。不过，在传统工会惨淡经营之际，数据工会作为巩固工人权力的的一种方式还是有前景的。

最重要的是，作者们的提议将AI世界中价值的集体性本质放到了聚光灯下。每个人变成了像油井一样的东西，从中可抽出数字经济赖以运行的燃料来。不论是出于公平还是效率的要求，那种燃料产生的收入都应当按劳分配。至于如何实现，则是难点所在。