16 December 2007


Chris Messina

Actually, I would argue AGAINST telling people to sign up just to clean things up (if that's really what you meant). Spock is kind of like the Yellow Pages, except worse, some how.

And I don't think that they're doing the web a service; if anything, there's proving the perceived benefit of keeping to yourself and being private so this kind of harvesting of your personal details can never occur.

I'd think that LinkedIN wouldn't want their customers data to be so leaky... public or not. What's the license on information you contribute to LinkedIN? Shouldn't you be able to control how and where it's reused? Especially in commercial contexts?

Jim Benson

Hey Chris,

Yeah, I agree.

I think that LinkedIn and other places that publish your information can't stop fair use. It also would be an argument that you gave up personal ownership of that information when you broadcast it in a public forum.

Would Dave Fisher or another lawyer who reads this maybe comment on the legal issues here?

It seems more an ethical issue than a legal issue. I can't imagine that Spock's VCs would let them do something patently illegal. Ethically problematic ... sure!

Beth Kanter

thank you for clarifying this -- a dilemna here

Bill Anderson

Jim, I was a bit suspicious when I started receiving invites from people I trust. (That in itself is an interesting development. I'm suspicious of invites from those I trust AND those I don't trust as much.)

I see this as another learning experience. Learning what happens when computer programs are let loose with some ill-thought out assumptions coded in the software.

It's inevitable that this will happen more and more for several reasons. First, it's becoming easier to write code to crawl and mashup information. Second, this that's true let's just try it out everywhere. Third, since we're calling programs that manipulate data "smart" it's easy to confuse software processing with intelligence.

I expect to see more and more of this. The good news is that we are starting to think more and more of the social effects of technology. Finally.


Nancy White

Jim, I have been passing this URL around a lot today (as well as having blogged it.) I've been doing an informal poll of the people who "invited me."

1. So many invites decided to try it
2. A person they respected/trusted invited them
3. Curious
4. Wanted to fix erroneous data

Jay Fienberg

(Slightly nervous that, by commenting here, I'll feed Spock more info it can misuse.)

This reminds me a bit of Technorati, which presents search results as if they're pages that identify "you" (or, your blog posts) somehow. Those pages aren't more identifying than, say, a Google search on floccinaucinihilipilificates, but there's a system in place that peer pressures you to link or claim their pages as your own.


While I think misleading users to invite their friends to join is evil, I don't think Spock is any worse than any other social network out there.

In fact, I would argue that all social networks got started from spamming people. We all have heard of how MySpace got jumpstarted by their parent company's marketing arm, how Facebook spams Harvard dorm lists, how Flixster spams everyone on the Web, ... the list goes on and on.

I am not defending Spock, but I think you might be a bit too harsh on them. They at least have several steps for invites; whereas, a lot of social networks just send invites right after you sign in.

Sue Thomas

Well Jim, I signed up because *you* invited me, and you are a trusted friend! But are you saying now that you didn't actually invite me, just that Spock generated the invite itself? Bizarre. At any rate, I admit that I did join and looked at my info with some bemusement since some of it is correct but out of date, and other parts yet again confuse me with Sue Thomas FBEye! I am cursed with this - I even get her fan mail - and this kind of automatic profiling has no way to distinguish between the two of us.

Jim Benson

Hi Bob,

Yes, a few people on the net have wondered if I wasn't a bit too harsh. And I wondered it myself.

The scraping didn't both me so much, but re-appropriating my data in a manner that specifically means to act as a representation of me - me personally - is inappropriate.

Especially when names are non-unique IDs for people. To hold that data back and, when you do join in say, "Look we did all this work for you!" That would be wonderful!

But to have multiple pages of the Jim Benson that is me and the Jim Benson of space exploration munged together on many pages ... that is not good.

Think of what is going to happen on Spock when there some guy, we'll call him Bob, goes out and shoots up a church. Well, Spock will incorporate that into anyone its bots believe to be the same Bob.

That is not good. Not good at all. It may well be beard-worthy.


Hi - I was reading this post and wanted to take an opportunity to share some details on how we do some of the things talked about above. Please feel free to email me at jay@corp.spock.com and I am more then happy to discuss or meet in person and share any details on what we collect, how we collect it, and how we use information. I even welcome you to come to Spock HQ.

With regards to some of the pressing questions above.

1. Invites - I am sorry if many people have been recieving a lot of invites to Spock. Please bear in mind that we NEVER send unsolicated email and that all email is user generated. In addition, we clearly state that if you request trust from someone, that they will get an email letting them know that you want to request their trust.

A bit more into the data to give some details. For example, today, we had over 1.5 million email lookups conducted on Spock. i.e - people asked Spock to scan their address book to see if any of their contacts were indexed on Spock. Out of those lookups, only 84,000 trust requests were sent out. This means that 96% of the time, people only want to see if Spock has indexed their friends and not invite them to join. I hope this can clarify the issue some people raised above that we trick people to send invites. I have been trying to improve the process as much as possible to be clear and direct on how it works. Please let me know where I can be more clear. We change it every week based on user feedback. For example, users requested that they be able to see a copy of the email being sent to people they want to trust, which we will do in the near future.

Of the 4% of users who want to invite people, about 50% of those people invited to a trust network do sign-up and join that person's trust network. So, today, of the 84,000 trust requests sent, about 40,000 were accepted on the other end as well.

We also track complaints delivered to Spock very carefully and take action asap. So far, we have not recived complaints directly by people saying that we tricked them into inviting others. We HAVE recieved a lot of complaints that people are getting too many trust requests or that they are getting requests from people they do not know. I think we have figured out why this is happening.

a. We only allow you to send a trust request (when u ask us to scan your address book) if you have the email address of the person in question. We assumed that if you have someone's address in your address book, that you knew them fairly well and the other person would know you as well. The problem here is that many email clients today automatically add anyone you email or who emailed you to your address book. So, a lot of people that you might not REALLY know are in the address book. This is a major cause of the issue. I am looking at ways to address this and would like your input. One way would be to limit the number of trust requests you can send to a certain number.

b. The other reason for a lot of trust requests is that Spock has become pretty popular amongst people who have large networks (such as recruiters, sales, biz dev folks etc). They want to have their entire network on Spock and sent out a lot of trust requests. In several cases, where they wanted to trust a crazy high number of people from their address book, I had stopped the process and emailed them directly to confirm that they did indeed want so many people in their trust network - and they said they did. I am sure that someone with 5,000 in their network, not everyone in that network would know them.

2. How we crawl and gather data: We crawl the web the same way any other search engine does. We adhere to every robots.txt file from every site and we only crawl information in the public domain. We do not go into password protected sites, or pages that a site asks us not to crawl.

In the process of crawling, we do gather a lot of information about people. We made a decision a long time ago that we wanted to represent people in their best light, and not in a creepy way. We try to never display publically identifiable information (email, address, phone number, im, etc), even if we gathered it from a public source. We just dont think it is cool to show PII. We also do not sell our data and try to give attribution to every source where we got our data from. We are trying to improve on this process so that we can be even more clear on where the data came from and how we gathered it.

We also allow people to claim their search result and remove wrong information and add information that they want to be searched on. If people want data removed, we have a easy process for them to get that data removed.

I welcome your thoughts and ideas on what more we should be doing. Spock is a search engine that is going into a lot of uncharted areas, especially with the idea of community contributions to search results and the wisdom of crowds.


Andrey Golub

I want to comment only for this statement, Jim:

"...tell everyone I know to JOIN SPOCK - simply so they can clean up their bios" <- this does not seem to be a strategy- you'll have to spend all your time "cleaning" your Spock profile regularly, and asking some other your "supporters" to help you with this job. Anyway you won't work.

Spock is a Search Engine 2.0, isn't it?
So if not you- the other people who know you, will fill this information again and again. Your profile is not owned by Spock, it's owned by Spock Community. Oh this crazy 2.0 world, right? :)
So the 2.0 is Evil as concept, not Spock?
Anyone including me or Jay could generate some new tags for you based on your LinkedIn or MySpace profile or your Blog. Not only robot but a real person- so what would you do in this case?

Robot is only to start. Google Robot is also the end, but Spock is 2.0 engine, so Spock Robot is proposing some information it could find about you, to verify and to complete for you and Community.
The Web 2.0 is self-referred, right? If one will ignore its on-line reputation- the Web 2.0 will decide it by itself, won't ask his/her request! It's only about LinkedIn we may say "I did share the info with LinkedIn not with the rest of the world", but it won't work about the people that know you, or just read about you somewhere on the Web so can fill/confirm some information about you.

So it's not enough just to "clean" your tags, you'll need to ask the Spock Team to CLOSE your account- only in this way you may disappear from searches.
but also this does not seem to be a strategy. Did you ever try to ask Google deleting some information about you on the Web? And in the case of Google we talk just about a page. On Spock you are a person. Your profile there "aggregates" your On-Line Identity. So the only way to escape from an on-line identity aggregator (Spock or any other, the other exist and will appear lots of another sites/services), if not to ask them to cancel and never open a record for you, is to "clean" ALL your tracks on the Web- from home pages and photos with tags, to Social Networking sites. Could this be possible? If course no, there is no way back to disconnected world.

think of Spock just as about a people aggregator with 2.0-management system- is there any chance to escape from being found by an aggregator, if your tracks are everywhere? No f... chance! But is this useful? I do not think so. My advice would be- to carry about your on-line reputation, so Spock will be a friend for you, not Evil :) it will promote your tags, your pages and so on. Help you getting new readers for your Blog, help you finding a new job, why not? and so on! Spock's tags are very good search-able by Google btw :)

So the choice here, I think, is to decide for yourself if the Web 2.0 or the Web in general is Evil or a Friend? If it's Evil for you- you'll get lots problems from it for sure (Evil brings problems!), and even worst if you will fight the 2.0! I think the strategy is to understand the rules and to play the rules of the 2.0 game. only this way!

btw I clearly understand that there are lots of people that could not watch their on-line reputation and manage it, like the VIP people and celebrities. So Spock must pay some great attention to the information its Robots gain for non-online people. Of course it's not enough to leave to community management of everything. BUT for the people that accept the on-line 2.0 game rules- the all I wrote above is valid. it's our responsibility and the biggest interest to maintain our reputation! I think Spock only helps us- it puts together all that Web knows about us + allows to Web 2.0 add some its members' opinions.

I wish good and correct tags to everyone! :)
Kind Regards,
Andrey G.


The problem isn't that 2.0 is evil, the problem is that the Spock platform seems to ignore one of the most critical aspects of any online community — the ability to know where information comes from.

People are accustomed to search engines. If I look up a person on Google, I know and they know that I may find information about them that is out of date, false, or, at the very least not by them. However, if I go to their profile on a social networking site, I expect and they expect that what I see there will be information created by them with which they chose to be associated when they posted it there.

Both of these paradigms are just fine, the problem is creating a space where it is not clear which rules apply. It should be possible, web-wide, to distinguish biographical from autobiographical pages at a glance. Having a single name/page-space containing pages which could be either, depending on circumstances, is a recipe for disaster.

Andrey Golub

I will most probably agree with you about that critical issue ignored [yet] :) by Spock, but my thought and comment above aimed to explain that since Spock is a Search Engine 2.0- it does not really matter for the END result, your tags, who had added that piece of information- Spock or the community!
I was commenting not in general about it all, but exactly about the phrase of Jim "JOIN SPOCK - simply so they can clean up their bios"- that won't never work I believe. it's only about that. the basic idea is that it's not possible and does not have any sense to fight the community, although I totally agree that the INITIAL information that comes from Spock Robot, should be presented better to Web Users, and so on what you have said yourself.

the issues raised by you- those are serious, I will think about it a bit and then come with another comment here.
btw if you have a clear idea for how this could be improved (how to better present the results of Spock searches)- why do not suggest it to Spock Team? As we do it on the Groups on Facebook (for fans and supporters) and Google Group for brainstorming 2.0

thank you,
Andrey Golub

Kathy Jacobs

I've been going back and forth on how to comment on this entry. I think that a lot of how one looks at Spock depends on how you look at social networks. Two of the ways you can look at a social network are that they let the world see information about you or they let you see information about others. Just as with any search, when you search for information about someone on Spock, you are working from the assumption that the information is correct. (Whether that information is about you or someone else.)

What I like about Spock is that it tells me not just who the person is and how active they are, but also whether others have agreed with what is said about them. If someone posts on a board something about themselves that is not quite true, it is there forever. On Spock, the untrue information is caught by the community and corrected by community vote.

Because I know that the community helps to verify the information, I also know that if incorrect information is provided the community will help to correct it. If someone puts a tag on some other Kathy Jacobs that is meant for me, it can be voted down and out. Contrast that with a flat web search where there is no information about whether the "Kathy Jacobs" whose results are returned is me or some other person.

In Sue Thomas's case, the search results do come up with several search results, with the TV show results showing first. However, because Sue's comment is linked to a page with her picture, once can scroll through the Spock results and find the "right" Sue Thomas. Finding the right Sue, can then provide the information that says who she is. (FYI: If one wanted to, one could even add a tag to Sue's Spock page that says "Not the TV Show Sue Thomas" :) ) Taking the example one step further, if you add the simple word "professor" to the search request, you get Sue's results as the second match.

When Sue votes on the tags she believes match reality, her power in the community grows and her voice becomes stronger. That in turn would raise her up in the search results.

Ok... enough. Time to go on to other things. If you are interested in checking out my Spock page, you can find it at:

Feel free to let me know what you think of my comments.


Andy, I'm not talking about the difference between information added by Spock or by the community (although that's another important one). I'm talking about the difference between information added by others and information added by me.

Consider MySpace, Wikipedia, and Google.

When I go to someone's MySpace page, i expect to see information about that person that was created and posted on purpose by them. The fact that I can assume this isn't incidental — it's an important part of the content of the page.

When I look up a Wikipedia article on a person, I expect to see information about them that was created by other people, but I can assume (for the most part) that this information was assembled on that page under that heading by human beings who had a common idea about whom they were writing. For example, if there's more than one person named Bob Smith noteworthy enough to have a page on Wikipedia, I expect that the problem of which one is which will already have been dealt with, so that a given page will correspond to just one of them.

When I Google someone's name, on the other hand, I know that the page I see (the search results page itself) is assembled by a machine. It consists of snippets of information that already existed elsewhere, but those snippets may be false or irrelevant, and they may not all pertain to the same person. When I go to a search engine, I know that this is what I'm going to get, so I expect that I may have to ignore some irrelevant information on my way to what I am looking for.

So far the Web has mostly kept firm lines between these three kinds of information, and sites that don't are rightly regarded as somewhat shady. You claim some magic exemption for Spock because it is "2.0", but that's not what 2.0 means. All three types of information described above are "2.0", but a confusing mishmash of them isn't.

It's easy to think that, because all of these kinds of information are useful, you can throw them together and they will still be useful, and people who need to know which kind of information they're looking at can look it up later. However, this isn't so. Whenever you read a text, the attitude you bring to it and the information you glean from it are deeply affected by the context in which it appears. The three kinds of information I described above (which we might call autobiographical, heterobiographical, and machine-compiled) are useful because users of them know which they are using. Each context has its own rules for what is claimed about information, how it is to be understood, how much it is to be trusted, etc. Remove the context and you render all three kinds of information sad shadows of themselves, useless to most people.

I have no problem with a site that gathers all three kinds of information in one place, but a real Web 2.0 site would make it easy to tell up front which kind of information is which. In particular, it would not make the misleading claim that someone "already has a profile" (this is widely understood to imply a page of autobiographical information when the site contains a machine-compliled page about that person.

Andrey Golub


I think this time I share all that you said- sounds very reasonable and could be some strong input for Spock to improve!
I can ensure you that there are some similar issues already in discussion for some long time inside a small team of Spock supporters.

I'll repeat it once again :), that Spock is my favourite Web 2.0 project today 'coz it's completely 2.0, including the way as anyone can contribute to its roadmap. + it's a new experience- so for me and other guys with similar passion to Spock, being its active supporters and contributors (advisers 2.0, hmm?), it's also a research about one of the most interesting topics of the Web 2.0 today!
So the last your conclusions, I'll be happy to submit it for a discussions to the Spock Brainstorming 2.0 group on Google (tell me if you want to join it- I'll be happy to invite you there).

There we already have some talks about the contextual trust, contextual grouping, private tags and the weight of the Profile Owner's votes for his profile.
There are many sides of Web 2.0 and Search Engine practise to consider if we wanna build a real strong People Search Engine 2.0, useful to all and reliable? I think everybody needs one like this, right?
Spock or anything else, it's better Spock since it's already here ;)

Kind Regards,
Andrey Golub- a Spock Evangelist and Blogger

