资源描述
Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,*,Click to edit Master title style,Click to edit Master text styles,Second level,Third level,Fourth level,Fifth level,*,*,Beliefs & Biases in Web Search,Ryen White,Microsoft Research,Bias in IR and elsewhere,In IR, e.g.,Domain bias People,prefer particular,Web domains,Rank bias People favor high-ranked results,Caption bias People prefer captions with certain terms,In psychology, e.g.,Anchoring-and-adjustment, confirmation, availability, etc.,All impact user behavior,O,pportunity to intersect psychology and IR,Our Interest in Biases,Bias can be observed in IR in situations where searchers seek or are presented with information that significantly deviates from the truth,More on the “truth later,Our Interest in Biases,Bias can be observed in IR in situations where searchers seek or are presented with information that significantly deviates from the truth,More on the “truth later,User behavior,Search engine behavior,Outline for Remainder of Talk,Initial Exploratory Questionnaire,Log Analysis,Labeling Content and Truth,Findings,Conclusions,Initial Exploratory Questionnaire,Gain early insight into possible biases in search,Focus on Yes-No questions (answered with “Yes or “No),Simplicity: Answers along single dimension (Yes No),Microsoft employees; recall recent Yes-No query (in last 2 weeks),Asked about belief beforehand and afterwards,Multi-point scale: Yes / Lean Yes / Equal / Lean No / No,200 respondents. Recalled questions such as:,“Does chocolate contain caffeine?“Are shingles contagious?,Survey Results,Two main findings:,1.,Respondents kept strongly-held beliefs (Yes,-,Yes and No,-,No),2.,If Before = Equal, then 2x as likely to believe Yes after search,Motivated us to:Further explore possible impact of biases on behavior and outcomes,Post-search belief given Pre-search belief,Confirmation?,Log-Based Study of Yes-No Queries,Queries, clicks, and results from Bing logs (2 weeks),Mined yes-no questions: start with “can, “is, “does, etc.,Focused on health since its important and we could get truth,Randomly selected set of 1000 yes-no health questions,Each issued by at least 10 users, same top 10, same captions,Examples include:,“Is congestive heart failure a heart attack? (answer = No)“Do food allergies make you tired? (answer = Yes),Other Data Collected,Yes-No Answer labels for captions and content of results,Physician answers for the Yes-No questions,Answer Labeling,Captions and result content,Crowdsourced (Clickworker ),3-5 judges/caption (consensus),Task was to assign label of:,- Yes only- No only- Both (Yes and No)- Neither (not Yes and not No),Agreement on 96% of captions,Performed similar labeling for each top 10 search results- Crowdsourced judges, agreement on 92% of pages,Yes only,No only,Both,Neither,Example Caption Labels,Answer Labeling,Captions and result content,Crowdsourced (Clickworker ),3-5 judges/caption (consensus),Task was to assign label of:,- Yes only- No only- Both (Yes and No)- Neither (not Yes and not No),Agreement on 96% of captions,Performed similar labeling for each top 10 search results- Crowdsourced judges, agreement on 92% of pages,Yes only,No only,Both,Neither,Example Caption Labels,Answer Labeling,Captions and result content,Crowdsourced (Clickworker ),3-5 judges/caption (consensus),Task was to assign label of:,- Yes only- No only- Both (Yes and No)- Neither (not Yes and not No),Agreement on 96% of captions,Performed similar labeling for each top 10 search results- Crowdsourced judges, agreement on 92% of pages,Yes only,No only,Both,Neither,Example Caption Labels,Answer Labeling,Captions and result content,Crowdsourced (Clickworker ),3-5 judges/caption (consensus),Task was to assign label of:,- Yes only- No only- Both (Yes and No)- Neither (not Yes and not No),Agreement on 96% of captions,Performed similar labeling for each top 10 search results- Crowdsourced judges, agreement on 92% of pages,Yes only,No only,Both,Neither,Example Caption Labels,Physician Answers,Two physicians reviewed the 1000 questions and gave answers,Inc.,50/50,= need more info,Dont know,= really unsure,Agreement between physicians on Yes-No was 84% (,=,0.668),Focused on the,680 questions,where both agreed Yes or No,Distribution:,55% Yes and 45% No,(used as,TRUTH,in our study),Using Physician Answers as Truth,Used consensus physician answers as truth in three ways:,How closely does distribution of results match the truth?,How closely does interaction behavior match the truth?,How closely do answers that people reach match,the,truth?,Bias = Distributions,significantly differ from 55-45 Yes-No base rates,Taking Stock of Our Data,We have:,680 Yes-No health questions from search logs,Ground truth for each q via physicians consensus judgments,For each question we have:,HTML content of top 10 search results, plus:,Caption labels for Yes/No/Both/Neither,Result labels,for Yes/No/Both/Neither,Clickthrough behavior from logs,Analysis,Three directions for analysis,:,Study,ranking of results,with,Yes-No content,Study,user behavior,w.r.t. Yes-No content,Study,answer accuracy,for Yes-No questions,Result Ranking,Volume of Yes-No content in the results,Percentage,of,captions or results,with,answer,More Yes content in top-10 than No content,Relative ranking of,top,Yes-No content when both in top 10,Percentage,of SERPs where top,yes,caption or result,appears above,(nearer the top of the ranking than) the top,no, Yes content ranked above No more often (when both shown),User Behavior (Clickthrough rate),Studied,clickthrough rates,on captions containing answers,Controlled for rank by just considering top result (r=1),SERP,click likelihoods for different captions given,variations in,answer presence in SERPs/captions, and rank,3-4x as likely to click on captions with Yes content, even though TRUTH = 55% Yes / 45% No,Just considering,top search result,User Behavior (Result skipping),Studied result,skipping,behavior,Frequency with which people skipped caption w/answer to click other caption,Distribution,of clicks and skips by,answer,Users more likely (4x) to skip No to click Yes than vice versa,No,No,No,Yes,Caption 1,Caption 2,Caption 3,Caption 4,Answer Accuracy,Examined accuracy of the top search result, as well as first click and last click in session,Findings show:,1. Top result accurate only 45% of time, less when truth is No,2. Users improve accuracy, but only slightly (limited by top 10),Summary of Main Findings,We observed:,Engines more likely to rank Yes above No, and return more Yes,People much more likely to click on Yes than No, even when control for availability and rank position,Engine had wrong answer top rank for half of questions* Given that answer present at top position (80% of queries),Caveats:,Findings for,our,particular set,of Yes-No health,questions,More work needed to validate with other question sets, domains beyond health, etc.,Discussion,Possible causes for observed bias include:,Search engines use behavior (hurt by common misconceptions),Ranking algorithms consider query matche.g., for query: can acid reflux cause back pain?:,Yes docs w/ “Acid reflux can cause back pain better match (6 of 6 terms) than No docs w/ “Acid reflux cannot cause back pain (5 of 6 terms),missing from query,Conclusions,Studied potential bias in user behavior and outcomes,Showed effects on both from search engines,2% of queries are Yes-No questions; Searchers want answers!,To get users to,accurate,answers, engines should consider truth,Future directions:,Study availability of Yes-No content,online; Move beyond Yes-No,Consider,how truth should be determined and used in ranking,F,ollow-up user studies,
展开阅读全文