What should AI Chatbots know about you?
New tools, new rules.
When we used encyclopedias, phone books and did old-fashioned research, we didn’t think much of these actions as they were mostly in their own and to their own. If someone watched us put a book back in the Library or saw us reading a particular newspaper, no one thought twice about it. Who cares? What is the harm? Not a big deal. Still isn’t
As the timeline moves forward, the internet came and that changed how we get content. In Web 2.0, you couldn’t ask a person for help, so the internet needed a way to find what you are looking for. The information explosion that came with it had to be filtered through so you could actually find what you are looking for. The cold start problem has to be cleared so people are not seeing random content from the internet they have no interest in. Imagine walking into a library and a librarian keeps pulling books out for you that you don’t care about. Not a great user experience, and with the scale of the internet, it is also not reasonable.
To sift through the noise, the system had to know you and your interests. Recommender systems as part of predictive analytics had to be developed to generate the algorithmic curation that is at the heart of the internet. This is a good example of how new tech creates new fields and jobs that never existed before and never even been thought of. Big data, predictive models, became an industry on their own, and you became a cookie.
User behaviour started being a commodity. Surveillance capitalism is born and you can be instantly recognized with device fingerprinting and service-side tracking. Deterministic and probabilistic models create a Single Customer View that follows you around the internet, no matter which device or platform you are using.
Having all the info in the world at your fingertips is extremely underappreciated and empowering. It is also massively expensive to do, but we have it for free. The saying “if you are not paying for the product, you are the product” is a foundation concept of the digital economy, and more or less, we have learned to accept it.
Now we have entered the next phase with the era of AI, and this should restart the conversation of privacy and what chatbots (companies) should collect from you and what they shouldn’t. The reason this is a different time is because it is a totally different tool. People are having conversations with AI chatbots, planning with them, and actually using them to develop things.
The individual responsible for the horrific Tumbler Ridge mass shooting used, according to a report, ChatGPT for scenarios of gun violence. His interaction with ChatGPT raised alarms within the company, and after a discussion, the company decided that it didn’t meet the "imminent and credible risk" threshold, and so they didn’t notify authorities.
In hindsight, it seems to be a grave error, but is it? The time in between wasn’t “imminent” and what would authorities be able to do if it was reported back then? What if ChatGPT also made a mistake in the nature of the discussion with its chatbot and reported in error? Random questions on a scene or script development, for example, that could be very easily mistaken for reality and demand reality in response.
So what is AI chatbots’ role? Should they mind their own business? How can authorities react to what something that can be a predictive model? Who is liable for failure - a chatbot or authority? Watch the movie Minority Report.
It’s clear that we are dealing with new variables we have no clear answers for, but as this technology is in its infancy, the number of these situations will only grow. Solutions and protocols will be needed. A family impacted by the tragedy is suing OpenAI, and it might be one of the most important cases that might start the ball rolling.
The other aspect of this technology utilization is when people actually build things. Be it apps or discovery processes, computations, training models etc. What protection do users have against back-end scraping?
Some businesses and enterprise-tier AIs aren’t used to train models. Many of the free and consumer tiers are set to “improve the model” by default. It is also not clear what ‘flags’ the systems can be set on for discovery just like they do for violent content.
How AI-generated content merges with human authorship for IP, protection and privacy is the deliberation. Will user input parameters that develop output innovations be counted as human inventors? Do the requirements of model training to develop the output have any ownership for the user and not used for a different company’s model?
As with all new things, there are many new scenarios that generate new questions. With such fast moving and impact technology as AI, we need to start thinking of answers sooner than later.


