FASTA searches a protein or DNA sequence data bank 36.3.4 Apr, 2011 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448 Query: pF1KB7755, 446 aa 1>>>pF1KB7755 446 - 446 aa - 446 aa Library: human.CCDS.faa 18511270 residues in 32554 sequences Statistics: Expectation_n fit: rho(ln(x))= 8.4289+/-0.000812; mu= 6.2493+/- 0.049 mean_var=244.6231+/-50.353, 0's: 0 Z-trim(116.6): 58 B-trim: 173 in 1/54 Lambda= 0.082002 statistics sampled from 17209 (17268) to 17209 sequences Algorithm: FASTA (3.7 Nov 2010) [optimized] Parameters: BL50 matrix (15:-5), open/ext: -10/-2 ktup: 2, E-join: 1 (0.817), E-opt: 0.2 (0.53), width: 16 Scan time: 3.080 The best scores are: opt bits E(32554) CCDS10428.1 SOX8 gene_id:30812|Hs108|chr16 ( 446) 3155 385.8 4.8e-107 CCDS11689.1 SOX9 gene_id:6662|Hs108|chr17 ( 509) 1178 151.9 1.3e-36 CCDS13964.1 SOX10 gene_id:6663|Hs108|chr22 ( 466) 869 115.4 1.3e-25 CCDS6159.1 SOX17 gene_id:64321|Hs108|chr8 ( 414) 482 69.5 7.1e-12 >>CCDS10428.1 SOX8 gene_id:30812|Hs108|chr16 (446 aa) initn: 3155 init1: 3155 opt: 3155 Z-score: 2035.3 bits: 385.8 E(32554): 4.8e-107 Smith-Waterman score: 3155; 100.0% identity (100.0% similar) in 446 aa overlap (1-446:1-446) 10 20 30 40 50 60 pF1KB7 MLDMSEARSQPPCSPSGTASSMSHVEDSDSDAPPSPAGSEGLGRAGVAVGGARGDPAEAA :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS10 MLDMSEARSQPPCSPSGTASSMSHVEDSDSDAPPSPAGSEGLGRAGVAVGGARGDPAEAA 10 20 30 40 50 60 70 80 90 100 110 120 pF1KB7 DERFPACIRDAVSQVLKGYDWSLVPMPVRGGGGGALKAKPHVKRPMNAFMVWAQAARRKL :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS10 DERFPACIRDAVSQVLKGYDWSLVPMPVRGGGGGALKAKPHVKRPMNAFMVWAQAARRKL 70 80 90 100 110 120 130 140 150 160 170 180 pF1KB7 ADQYPHLHNAELSKTLGKLWRLLSESEKRPFVEEAERLRVQHKKDHPDYKYQPRRRKSAK :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS10 ADQYPHLHNAELSKTLGKLWRLLSESEKRPFVEEAERLRVQHKKDHPDYKYQPRRRKSAK 130 140 150 160 170 180 190 200 210 220 230 240 pF1KB7 AGHSDSDSGAELGPHPGGGAVYKAEAGLGDGHHHGDHTGQTHGPPTPPTTPKTELQQAGA :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS10 AGHSDSDSGAELGPHPGGGAVYKAEAGLGDGHHHGDHTGQTHGPPTPPTTPKTELQQAGA 190 200 210 220 230 240 250 260 270 280 290 300 pF1KB7 KPELKLEGRRPVDSGRQNIDFSNVDISELSSEVMGTMDAFDVHEFDQYLPLGGPAPPEPG :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS10 KPELKLEGRRPVDSGRQNIDFSNVDISELSSEVMGTMDAFDVHEFDQYLPLGGPAPPEPG 250 260 270 280 290 300 310 320 330 340 350 360 pF1KB7 QAYGGAYFHAGASPVWAHKSAPSASASPTETGPPRPHIKTEQPSPGHYGDQPRGSPDYGS :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS10 QAYGGAYFHAGASPVWAHKSAPSASASPTETGPPRPHIKTEQPSPGHYGDQPRGSPDYGS 310 320 330 340 350 360 370 380 390 400 410 420 pF1KB7 CSGQSSATPAAPAGPFAGSQGDYGDLQASSYYGAYPGYAPGLYQYPCFHSPRRPYASPLL :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS10 CSGQSSATPAAPAGPFAGSQGDYGDLQASSYYGAYPGYAPGLYQYPCFHSPRRPYASPLL 370 380 390 400 410 420 430 440 pF1KB7 NGLALPPAHSPTSHWDQPVYTTLTRP :::::::::::::::::::::::::: CCDS10 NGLALPPAHSPTSHWDQPVYTTLTRP 430 440 >>CCDS11689.1 SOX9 gene_id:6662|Hs108|chr17 (509 aa) initn: 1081 init1: 586 opt: 1178 Z-score: 770.5 bits: 151.9 E(32554): 1.3e-36 Smith-Waterman score: 1243; 48.5% identity (67.0% similar) in 470 aa overlap (16-419:19-480) 10 20 30 40 50 pF1KB7 MLDMSEARSQPPCSPSGTASSMSHVEDSDSDAPPSPAGSEGLGRAGVAVGGARGDP- :: : : . ::: .. :: .::. . .:.: CCDS11 MNLLDPFMKMTDEQEKGLSG-APSPTMSEDSAGSPCPSGSGSDTENTRPQENTFPKGEPD 10 20 30 40 50 60 70 80 90 100 110 pF1KB7 --AEAADERFPACIRDAVSQVLKGYDWSLVPMPVRGGGGGALKAKPHVKRPMNAFMVWAQ :. ...::.:::.:::::::::::.::::::: .:.. : :::::::::::::::: CCDS11 LKKESEEDKFPVCIREAVSQVLKGYDWTLVPMPVRVNGSS--KNKPHVKRPMNAFMVWAQ 60 70 80 90 100 110 120 130 140 150 160 170 pF1KB7 AARRKLADQYPHLHNAELSKTLGKLWRLLSESEKRPFVEEAERLRVQHKKDHPDYKYQPR :::::::::::::::::::::::::::::.:::::::::::::::::::::::::::::: CCDS11 AARRKLADQYPHLHNAELSKTLGKLWRLLNESEKRPFVEEAERLRVQHKKDHPDYKYQPR 120 130 140 150 160 170 180 190 200 210 220 pF1KB7 RRKSAKAGHSDSDSGAELGPHPGGGAVYKA--------EAGLGDGHHHGDHTGQTHGPPT ::::.: :..... ..: : . .:..:: .:... : :.:.::..:::: CCDS11 RRKSVKNGQAEAEEATEQ-THISPNAIFKALQADSPHSSSGMSEVHSPGEHSGQSQGPPT 180 190 200 210 220 230 230 240 250 260 270 280 pF1KB7 PPTTPKTELQQAGAKPELKLEGRRPVDSGRQN-IDFSNVDISELSSEVMGTMDAFDVHEF :::::::..: . : .:: ::: ..::: ::: .:::.::::.:......:::.:: CCDS11 PPTTPKTDVQPG--KADLKREGRPLPEGGRQPPIDFRDVDIGELSSDVISNIETFDVNEF 240 250 260 270 280 290 290 300 310 320 pF1KB7 DQYLPLGG-PA-PPEPGQA-YGGAY-------FHAGASPVWAHKS-APSA------SASP ::::: .: :. : ::. : :.: :.:. :: :. :: .: : CCDS11 DQYLPPNGHPGVPATHGQVTYTGSYGISSTAATPASAGHVWMSKQQAPPPPPQQPPQAPP 300 310 320 330 340 350 330 340 350 pF1KB7 TETGPPRP---------------------------------HIKTEQPSPGHYGDQPRGS . .::.: :::::: ::.::..: . : CCDS11 APQAPPQPQAAPPQQPAAPPQQPQAHTLTTLSSEPGQSQRTHIKTEQLSPSHYSEQQQHS 360 370 380 390 400 410 360 370 380 390 400 410 pF1KB7 PDYGSCS--GQSSATPAAPAGPFAGSQGDYGDLQ-ASSYYGAYPGYAPGLYQYPCFHSP- :. . : . .:. : :.. :: :: : : .::::. : . :::. . .: CCDS11 PQQIAYSPFNLPHYSPSYP--PITRSQYDYTDHQNSSSYYSHAAGQGTGLYSTFTYMNPA 420 430 440 450 460 470 420 430 440 pF1KB7 RRPYASPLLNGLALPPAHSPTSHWDQPVYTTLTRP .::. .:. CCDS11 QRPMYTPIADTSGVPSIPQTHSPQHWEQPVYTQLTRP 480 490 500 >>CCDS13964.1 SOX10 gene_id:6663|Hs108|chr22 (466 aa) initn: 1010 init1: 579 opt: 869 Z-score: 573.4 bits: 115.4 E(32554): 1.3e-25 Smith-Waterman score: 1281; 50.3% identity (68.4% similar) in 475 aa overlap (2-446:10-466) 10 20 30 40 50 pF1KB7 MLDMSEARSQPP-CSPSGTASSMSHVEDSDSDAPPSPAGSEGLGRAGVAVGG ...: . :. : : :.: :.. :. . . : : : :. : : CCDS13 MAEEQDLSEVELSPVGSEEPRCLSPGSAPSLG--PDGGGGGSGLRA-SPGPGELG-KVKK 10 20 30 40 50 60 70 80 90 100 110 pF1KB7 ARGDPAEAADERFPACIRDAVSQVLKGYDWSLVPMPVRGGGGGALKAKPHVKRPMNAFMV . : .:: :..::.:::.::::::.::::.::::::: .: : :.::::::::::::: CCDS13 EQQD-GEADDDKFPVCIREAVSQVLSGYDWTLVPMPVRVNG--ASKSKPHVKRPMNAFMV 60 70 80 90 100 110 120 130 140 150 160 170 pF1KB7 WAQAARRKLADQYPHLHNAELSKTLGKLWRLLSESEKRPFVEEAERLRVQHKKDHPDYKY ::::::::::::::::::::::::::::::::.::.::::.:::::::.::::::::::: CCDS13 WAQAARRKLADQYPHLHNAELSKTLGKLWRLLNESDKRPFIEEAERLRMQHKKDHPDYKY 120 130 140 150 160 170 180 190 200 210 pF1KB7 QPRRRKSAKA--GHSDSDSG-AELGPHPGGGAVYKA------EAGLGDGHHHGD--H-TG ::::::..:: :... .: :: : . : ::. . : :. :. : .: CCDS13 QPRRRKNGKAAQGEAECPGGEAEQGGTAAIQAHYKSAHLDHRHPGEGSPMSDGNPEHPSG 180 190 200 210 220 230 220 230 240 250 260 270 pF1KB7 QTHGPPTPPTTPKTELQQAGAKPELKLEGRRPVDSGRQNIDFSNVDISELSSEVMGTMDA :.:::::::::::::::.. : : : .:: ..:. .:::.::::.:.: :::..:.. CCDS13 QSHGPPTPPTTPKTELQSGKADP--KRDGRSMGEGGKPHIDFGNVDIGEISHEVMSNMET 240 250 260 270 280 290 280 290 300 310 320 330 pF1KB7 FDVHEFDQYLPLGG-PAP----PEPGQAYGGAYFHAGASPVWAHKSAPSASASPTETGPP ::: :.::::: .: :. : . :.: :.. .: : : . : :: ..:: CCDS13 FDVAELDQYLPPNGHPGHVSSYSAAGYGLGSALAVASGHSAWISK--PPGVALPT-VSPP 300 310 320 330 340 340 350 360 370 380 pF1KB7 ----RPHIKTEQ--PS-PGHYGDQPRGSP-DYGSCS--GQSSATPAAPAGPFAGSQGDYG . ..::: :. : :: ::: : : : : .:: :. : ::. CCDS13 GVDAKAQVKTETAGPQGPPHYTDQPSTSQIAYTSLSLPHYGSAFPSISRPQF-----DYS 350 360 370 380 390 400 390 400 410 420 430 440 pF1KB7 DLQASSYYGAYPGYAPGLYQYPCFHSP-RRPYASPLLN-GLALPPAHSPTSHWDQPVYTT : : :. : .. : : :::. . .: .:: . . . . . : .:::: ::.:::::: CCDS13 DHQPSGPYYGHSGQASGLYSAFSYMGPSQRPLYTAISDPSPSGPQSHSPT-HWEQPVYTT 410 420 430 440 450 460 pF1KB7 LTRP :.:: CCDS13 LSRP >>CCDS6159.1 SOX17 gene_id:64321|Hs108|chr8 (414 aa) initn: 496 init1: 424 opt: 482 Z-score: 326.6 bits: 69.5 E(32554): 7.1e-12 Smith-Waterman score: 509; 34.6% identity (54.4% similar) in 364 aa overlap (55-373:5-352) 30 40 50 60 70 80 pF1KB7 VEDSDSDAPPSPAGSEGLGRAGVAVGGARGDPAEAADERFPACIRDAVSQVLKGYD---W : . :.:.. . ..:. :. : : CCDS61 MSSPDAGYASDDQ--SQTQSALPAVMAGLGPCPW 10 20 30 90 100 110 120 pF1KB7 --SLVP---MPVRG----------GGGGALKAKPHVKRPMNAFMVWAQAARRKLADQYPH :: : : :.: :..: :.. ...::::::::::. :..::.: : CCDS61 AESLSPIGDMKVKGEAPANSGAPAGAAGRAKGESRIRRPMNAFMVWAKDERKRLAQQNPD 40 50 60 70 80 90 130 140 150 160 170 180 pF1KB7 LHNAELSKTLGKLWRLLSESEKRPFVEEAERLRVQHKKDHPDYKYQPRRRKSAK------ :::::::: ::: :. :. .:::::::::::::::: .:::.:::.:::::..: CCDS61 LHNAELSKMLGKSWKALTLAEKRPFVEEAERLRVQHMQDHPNYKYRPRRRKQVKRLKRVE 100 110 120 130 140 150 190 200 210 220 230 pF1KB7 AG--HSDSD-SGAELGPHPGGGAVYKAEAGLGDGHHHGDHTGQTHGPPT-PPTTPK--TE .: :. .. ..: :::. :: : : ::: . . : ::: :: . CCDS61 GGFLHGLAEPQAAALGPE--GGRV--AMDGLG---LQFPEQGFPAGPPLLPPHMGGHYRD 160 170 180 190 200 240 250 260 270 280 290 pF1KB7 LQQAGAKPELKLEGRRPVDSGRQN-IDFSNVDISELSSEVMGTMDAFDVHEFDQYLPLGG :. :: : :.: :. . . .: . : . ... . : : .. . : .: CCDS61 CQSLGAPP---LDGY-PLPTPDTSPLDGVDPDPAFFAAPMPGDCPAAGTYSYAQVSDYAG 210 220 230 240 250 260 300 310 320 330 pF1KB7 PAPPEPGQAY-------GGAYFHAGASPVWAHKSAPSASASPTETG-------PPRPHIK : : : . .: . . .: : . .: .:: : : . : . CCDS61 PPEPPAGPMHPRLGPEPAGPSIPGLLAPPSALHVYYGAMGSPGAGGGRGFQMQPQHQHQH 270 280 290 300 310 320 340 350 360 370 380 390 pF1KB7 TEQPSPGHYGDQPRGSPDYGSCSGQSSATPAAPAGPFAGSQGDYGDLQASSYYGAYPGYA .: : : :: :. : .... :. :: CCDS61 QHQHHPPGPG-QPSPPPEALPC--RDGTDPSQPAELLGEVDRTEFEQYLHFVCKPEMGLP 330 340 350 360 370 400 410 420 430 440 pF1KB7 PGLYQYPCFHSPRRPYASPLLNGLALPPAHSPTSHWDQPVYTTLTRP CCDS61 YQGHDSGVNLPDSHGAISSVVSDASSAVYYCNYPDV 380 390 400 410 446 residues in 1 query sequences 18511270 residues in 32554 library sequences Tcomplib [36.3.4 Apr, 2011] (8 proc) start: Fri Nov 4 09:23:25 2016 done: Fri Nov 4 09:23:25 2016 Total Scan time: 3.080 Total Display time: 0.000 Function used was FASTA [36.3.4 Apr, 2011]