FASTA searches a protein or DNA sequence data bank 36.3.4 Apr, 2011 Please cite: W.R. Pearson & D.J. Lipman PNAS (1988) 85:2444-2448 Query: pF1KB9723, 414 aa 1>>>pF1KB9723 414 - 414 aa - 414 aa Library: human.CCDS.faa 18511270 residues in 32554 sequences Statistics: Expectation_n fit: rho(ln(x))= 10.0886+/-0.000916; mu= -1.1031+/- 0.055 mean_var=383.4931+/-79.841, 0's: 0 Z-trim(117.4): 57 B-trim: 169 in 1/51 Lambda= 0.065493 statistics sampled from 18044 (18100) to 18044 sequences Algorithm: FASTA (3.7 Nov 2010) [optimized] Parameters: BL50 matrix (15:-5), open/ext: -10/-2 ktup: 2, E-join: 1 (0.833), E-opt: 0.2 (0.556), width: 16 Scan time: 2.800 The best scores are: opt bits E(32554) CCDS6159.1 SOX17 gene_id:64321|Hs108|chr8 ( 414) 2980 295.0 8.8e-80 CCDS5977.1 SOX7 gene_id:83595|Hs108|chr8 ( 388) 847 93.4 3.9e-19 CCDS13552.1 SOX18 gene_id:54345|Hs108|chr20 ( 384) 683 77.9 1.8e-14 >>CCDS6159.1 SOX17 gene_id:64321|Hs108|chr8 (414 aa) initn: 2980 init1: 2980 opt: 2980 Z-score: 1545.8 bits: 295.0 E(32554): 8.8e-80 Smith-Waterman score: 2980; 100.0% identity (100.0% similar) in 414 aa overlap (1-414:1-414) 10 20 30 40 50 60 pF1KB9 MSSPDAGYASDDQSQTQSALPAVMAGLGPCPWAESLSPIGDMKVKGEAPANSGAPAGAAG :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS61 MSSPDAGYASDDQSQTQSALPAVMAGLGPCPWAESLSPIGDMKVKGEAPANSGAPAGAAG 10 20 30 40 50 60 70 80 90 100 110 120 pF1KB9 RAKGESRIRRPMNAFMVWAKDERKRLAQQNPDLHNAELSKMLGKSWKALTLAEKRPFVEE :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS61 RAKGESRIRRPMNAFMVWAKDERKRLAQQNPDLHNAELSKMLGKSWKALTLAEKRPFVEE 70 80 90 100 110 120 130 140 150 160 170 180 pF1KB9 AERLRVQHMQDHPNYKYRPRRRKQVKRLKRVEGGFLHGLAEPQAAALGPEGGRVAMDGLG :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS61 AERLRVQHMQDHPNYKYRPRRRKQVKRLKRVEGGFLHGLAEPQAAALGPEGGRVAMDGLG 130 140 150 160 170 180 190 200 210 220 230 240 pF1KB9 LQFPEQGFPAGPPLLPPHMGGHYRDCQSLGAPPLDGYPLPTPDTSPLDGVDPDPAFFAAP :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS61 LQFPEQGFPAGPPLLPPHMGGHYRDCQSLGAPPLDGYPLPTPDTSPLDGVDPDPAFFAAP 190 200 210 220 230 240 250 260 270 280 290 300 pF1KB9 MPGDCPAAGTYSYAQVSDYAGPPEPPAGPMHPRLGPEPAGPSIPGLLAPPSALHVYYGAM :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS61 MPGDCPAAGTYSYAQVSDYAGPPEPPAGPMHPRLGPEPAGPSIPGLLAPPSALHVYYGAM 250 260 270 280 290 300 310 320 330 340 350 360 pF1KB9 GSPGAGGGRGFQMQPQHQHQHQHQHHPPGPGQPSPPPEALPCRDGTDPSQPAELLGEVDR :::::::::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS61 GSPGAGGGRGFQMQPQHQHQHQHQHHPPGPGQPSPPPEALPCRDGTDPSQPAELLGEVDR 310 320 330 340 350 360 370 380 390 400 410 pF1KB9 TEFEQYLHFVCKPEMGLPYQGHDSGVNLPDSHGAISSVVSDASSAVYYCNYPDV :::::::::::::::::::::::::::::::::::::::::::::::::::::: CCDS61 TEFEQYLHFVCKPEMGLPYQGHDSGVNLPDSHGAISSVVSDASSAVYYCNYPDV 370 380 390 400 410 >>CCDS5977.1 SOX7 gene_id:83595|Hs108|chr8 (388 aa) initn: 775 init1: 525 opt: 847 Z-score: 456.9 bits: 93.4 E(32554): 3.9e-19 Smith-Waterman score: 847; 43.3% identity (61.9% similar) in 404 aa overlap (27-411:5-385) 10 20 30 40 50 pF1KB9 MSSPDAGYASDDQSQTQSALPAVMAGLGPCPWAESLS-PIGDMKVK-GEAPANSGAPAGA :: :: :.: : : ... :..: : : CCDS59 MASLLGAYPWPEGLECPALDAELSDGQSPPAVPRPPGD 10 20 30 60 70 80 90 100 110 pF1KB9 AGRAKGESRIRRPMNAFMVWAKDERKRLAQQNPDLHNAELSKMLGKSWKALTLAEKRPFV : .::::::::::::::::::::::: :::::::::::::::::::::::..:::.: CCDS59 KG---SESRIRRPMNAFMVWAKDERKRLAVQNPDLHNAELSKMLGKSWKALTLSQKRPYV 40 50 60 70 80 90 120 130 140 150 160 170 pF1KB9 EEAERLRVQHMQDHPNYKYRPRRRKQVKRL-KRVEGGFL-HGLAEPQAAALGPEGGRVAM .::::::.:::::.:::::::::.::.::: :::. ::: .:.. : : :: . CCDS59 DEAERLRLQHMQDYPNYKYRPRRKKQAKRLCKRVDPGFLLSSLSRDQNAL--PEKRSGSR 100 110 120 130 140 150 180 190 200 210 220 pF1KB9 DGLGLQFPEQGFPAGPPLLPPHMGGHYRDCQSLGA---PP--LDGYP--LPTP-DTSPLD .:: . . . : : : . : :.. . :. : .: :: :::: . :::: CCDS59 GALGEKEDRGEYSPGTAL--PSLRGCYHEGPAGGGGGGTPSSVDTYPYGLPTPPEMSPLD 160 170 180 190 200 210 230 240 250 260 270 280 pF1KB9 GVDPDPAFFAAPMPGDCPAAGTYSYAQVSDYAGPPEPPAGPMHPRLGPEPAGPSIP-GLL ..:. .::..: : . :. :. :. :. .: : : : : : CCDS59 VLEPEQTFFSSP----CQEEHGHPRRI-------PHLPGHPYSPEYAPSPLHCSHPLGSL 220 230 240 250 260 290 300 310 320 330 340 pF1KB9 APPSALHVYYGAMGSPGAGGGRGFQMQPQHQHQHQHQHHPPGPGQPSPPPEALPCRDGTD : .. : .: :: : . . .. :.. :: ::::: : :. : CCDS59 ALGQSPGV---SMMSPVPGCPPSPAYYSPATYHPLHSNLQAHLGQLSPPPEH-PGFDALD 270 280 290 300 310 350 360 370 380 390 400 pF1KB9 PSQPAELLGEVDRTEFEQYLHFVCKPEMG---LPYQGHD--SGVN-LPDSHGAISSVVSD . .::::..::.::.:::. .:. . . .:: : :. .. .. ::..: CCDS59 QLSQVELLGDMDRNEFDQYLNTPGHPDSATGAMALSGHVPVSQVTPTGPTETSLISVLAD 320 330 340 350 360 370 410 pF1KB9 ASSAVYYCNYPDV :. :.:: .: CCDS59 AT-ATYYNSYSVS 380 >>CCDS13552.1 SOX18 gene_id:54345|Hs108|chr20 (384 aa) initn: 849 init1: 541 opt: 683 Z-score: 373.3 bits: 77.9 E(32554): 1.8e-14 Smith-Waterman score: 859; 44.0% identity (59.6% similar) in 423 aa overlap (3-408:23-378) 10 20 30 40 pF1KB9 MSSPDAGYASDDQSQTQSALPAVMAGLGPCPWAESLSPIG .: : :.: .. .: ::..: .: : :: CCDS13 MQRSPPGYGAQDDPPARRDCAWAPGHGAAAD--TRGLAAGPAALA--APAAPASPPSP-Q 10 20 30 40 50 50 60 70 80 90 pF1KB9 DMKVKGEAPANSG-APAGAAGR-AKGESRIRRPMNAFMVWAKDERKRLAQQNPDLHNAEL .. :. : .::: . : : :::::::::::::::::::::::::::::::: : CCDS13 RSPPRSPEPGRYGLSPAGRGERQAADESRIRRPMNAFMVWAKDERKRLAQQNPDLHNAVL 60 70 80 90 100 110 100 110 120 130 140 150 pF1KB9 SKMLGKSWKALTLAEKRPFVEEAERLRVQHMQDHPNYKYRPRRRKQVKRLKRVEGGFL-H ::::::.:: :. :::::::::::::::::..:::::::::::.::... .:.: :.: CCDS13 SKMLGKAWKELNAAEKRPFVEEAERLRVQHLRDHPNYKYRPRRKKQARKARRLEPGLLLP 120 130 140 150 160 170 160 170 180 190 200 210 pF1KB9 GLAEPQAAALGPEGGRVAMDGLGLQFPEQGFPAGPPLLPPHMGGHYRDCQSLGAPPLDGY ::: :: : : . :::. . .:. ::: .:: CCDS13 GLAPPQ-----P--------------PPEPFPAASG-----SARAFRELPPLGAE-FDGL 180 190 200 210 220 230 240 250 260 270 pF1KB9 PLPTPDTSPLDGVDP-DPAFFAAPM-PGDC---PAAGTYSYAQVS-DYAGPPEPPAGPMH ::::. :::::..: . ::: : : :: : . :. ...: : .: : . CCDS13 GLPTPERSPLDGLEPGEAAFFPPPAAPEDCALRPFRAPYAPTELSRDPGGCYGAPLAEAL 220 230 240 250 260 270 280 290 300 310 320 330 pF1KB9 PRLGPEPAGPSIPGLLAPPSALHVYYGAMGSPGAGGGRGFQMQPQHQHQHQHQHHPPGPG : .: ::.: . :: :::..:.:: : :: CCDS13 -RTAP-PAAP-LAGL---------YYGTLGTPG-----------------------PYPG 280 290 340 350 360 370 380 pF1KB9 QPSPPPEALPCRDGTDPSQPA-ELLGEVDRTEFEQYLHFV-CKPEM-GLPY-----QGHD :::::: : ....: :: .: ..:: :::.:::. .:. :::: . CCDS13 PLSPPPEAPPL-ESAEPLGPAADLWADVDLTEFDQYLNCSRTRPDAPGLPYHVALAKLGP 300 310 320 330 340 350 390 400 410 pF1KB9 SGVNLPDSHGAISSVVSDASSAVYYCNYPDV ... :. . ::.. ::::::::: CCDS13 RAMSCPEESSLISAL-SDASSAVYYSACISG 360 370 380 414 residues in 1 query sequences 18511270 residues in 32554 library sequences Tcomplib [36.3.4 Apr, 2011] (8 proc) start: Fri Nov 4 18:32:10 2016 done: Fri Nov 4 18:32:10 2016 Total Scan time: 2.800 Total Display time: -0.020 Function used was FASTA [36.3.4 Apr, 2011]