KwamfutocinShirye-shirye

UTF-8 - harafin tsarinsa

Unicode goyon bayan kusan dukan data kasance hali sets. A mafi kyau nau'i na shigar da Unicode haruffa ne UTF-8 tsarinsa. Yana goyon bayan karfinsu da ascii, da juriya ga murdiya na bayanai, da yadda ya dace da kuma sauƙi na aiki. Amma da farko abubuwan da farko.

coding fom

Computers aiki ba kawai kamar yadda lambobin m ilmin lissafi abubuwa, kazalika da haduwa da raka'a ajiya da karža gyarawa-size data - byte da kuma 32-bit kalmomi. Tsarinsa misali dole dauki wannan a cikin asusun a lokacin da kayyade yadda za su gabatar da yawan haruffa.

A kwamfuta tsarin, da integers adana a ƙwaƙwalwar sel 8 ragowa (1 byte), 16 ko 32 ragowa. Kowane form ma'anar da Unicode tsarinsa, wanda jerin memory Kwayoyin ne wani lamba m zuwa wani alama. A misali akwai uku daban-daban siffofin coding Unicode haruffa 8, 16 da kuma 32-bit tubalan. Haka kuma, su aka sani da UTF-8, UTF-16 da kuma UTF-32. Sunan UTF tsaye ga Unicode Sake Kama Format. Kowace daga cikin uku siffofin shigar nufin shi ne daidai misali Unicode hali yana da abũbuwan amfãni a daban-daban aikace-aikace.

Data boye-boye za a iya amfani da su wakilci dukan haruffa a cikin Unicode misali. Saboda haka, su ne cikakken jituwa to mafita ga wasu dalilai mabambanta, ta amfani da daban-daban siffofin coding. Kowane coding iya ainihin a tuba a cikin wani daga cikin sauran biyu ba tare da asarar data.

nenalozheniya manufa

Kowace daga cikin siffofin Unicode tsarinsa ci gaba a view of ba m zoba. Alal misali, Windows-932 siffofin da haruffa na daya ko biyu bytes na code. A jerin tsawon dogara da farko byte, don haka jagorantar byte dabi'u a cikin jerin biyu-byte da kuma guda byte disjoint. Duk da haka, da darajar guda byte kuma trailing byte jerin iya daidaita. Wannan yana nufin misali da cewa harafin search D (code 44) za a iya samun shi a bisa kuskure ya shiga na biyu rabo daga jerin biyu-byte harafin "D" (code 84 44). Don gano abin da jerin daidai ne, shirin ya kamata ya yi la'akari da baya bytes.

A halin da ake ciki shi ne rikitarwa, idan manyan da kuma trailing bytes wasan. Wannan yana nufin cewa domin ya cire shubuha zai zama wani kuma baya da abun neman kafin kai farkon rubutu ko musamman code jerin. Wannan ba kawai ya kasa aiki, amma ba a kare daga yiwu kurakurai, tun daya kawai ba daidai ba byte zuwa cikakken rubutu ya zama unreadable.

Format hira Unicode zai kawar da wannan matsala, saboda darajar manyan, trailing, da kuma wani guda naúrar na ajiya ne ba iri daya bayanai. Wannan na tabbatar da cewa duk Unicode domin neman da kuma kwatanta, ba bada erroneous sakamakon saboda da daidaituwa na sassa daban daban na hali code. Cewa wadannan siffofin coding tsayar da manufa nenalozheniya, bambanta su daga sauran kasashen gabashin Asiya Multi-byte sakonnin imel.

Wani al'amari nonintersection Unicode sakonnin imel ne cewa kowane hali yana da wani fayyace kan iyaka. Wannan gusar da bukatar zuwa duba wani m yawan gabata alamomin. Wannan yanayin ne, wani lokacin kira kai-clocking tsarinsa. Murdiya na code raka'a zai gabatar da wani murdiya na daya kawai hali, da kuma kewaye haruffa ne har yanzu m. A 8-bit format hira, idan akan maki ga byte, suka fara da 10xxxxxx (a binary code) a sami fara alama ce ake bukata daya zuwa uku baya da mulki.

daidaito

Unicode Consortium cikakken goyon bayan duk 3 siffofin sakonnin imel. Yana da muhimmanci kada a yi hamayya da UTF-8 da kuma Unicode, kamar yadda duk hira Formats - daidai da inganci siffofin embodiment na Unicode harafin-tsarinsa misali.

Byte-fuskantarwa

Don wakiltar UTF-32 haruffa bukatar a 32-bit code naúrar, wanda ya yi daidai da Unicode code. UTF-16 - daya zuwa biyu 16-bit raka'a. A UTF-8 amfani har zuwa 4 bytes.

UTF-8 tsarinsa da aka tsara don zama jituwa tare da byte-daidaitacce ascii-tushen tsarin. Mafi yawa daga cikin data kasance software da kuma yi bayani fasahar na dogon lokaci ya dogara ga misali na haruffa a jerin bytes. Mahara ladabi dogara a kan uri na ascii tsarinsa da kuma yana amfani da ko dai zai kawar da musamman iko haruffa. A sauki hanyar daidaita da wa yanayi Unicode iya, ta amfani 8-bit coding domin wakiltar Unicode haruffa, wani m ascii hali ko wani iko hali. Don wannan karshen, kuma shi ne UTF-8 tsarinsa.

m tsawon

UTF-8 - coding na m tsawon, kunsha na 8-bit ajiya raka'a, da babba ragowa da nuna wa wanda ɓangare na jerin kowane mutum byte nasa. Daya kewayon dabi'u da majalisa ta zauna don na farko kashi na code jerin, wani - domin na gaba. Wannan na samar disjointness tsarinsa.

ascii

UTF-8 tsarinsa ne cikakken goyon ascii Lambobin (0x00-0x7F). Wannan yana nufin cewa Unicode haruffa U + 0000-U + 007F ana tuba zuwa guda byte 0x00-0x7F UTF-8 da kuma ta haka ta zama indistinguishable daga ascii. Bugu da ƙari, don kauce wa shubuha, da darajar 0x00-0x7F ba amfani da wani karin a guda byte misali na Unicode haruffa. Don encode alamomin neideograficheskih wanin ascii, ta amfani da jerin biyu bytes. Alamun Range U + 0800-U + FFFF aka wakilta uku bytes, da kuma ƙarin lambobin da fiye U + FFFF bukatar hudu bytes.

fanni na aikace-aikace

UTF-8 tsarinsa yawanci aka ba fin so a cikin HTML yarjejeniya, da kuma kama.

XML ya zama na farko da misali da cikakken goyon baya ga UTF-8 tsarinsa. Standards kungiyoyi kuma bayar da shawarar da shi. Support matsala a cikin URL adireshin da cewa shi ne daban-daban daga ascii-haruffa, aka warware a lokacin da Kamfanin W3C da IETF injiniya kungiyar zo zuwa wata yarjejeniya a kan coding dukkan URL adiresoshin na musamman a UTF-8.

Karfinsu tare da ascii facilitates miƙa mulki ga sabon software. Tare da UTF-8 aiki mafi rubutu Editocin, ciki har da JEdit, Emacs, BBEdit, Eclipse, da kuma "Notepad" da Windows aiki tsarin. Babu sauran nau'i na tsarinsa Unicode ba zai iya gadara da irin wannan goyon baya da kayan aiki.

coding amfani shi ne, shi kunshi jerin bytes. Tare da UTF-8 kirtani ne sauki aiki a C da sauran shirye-shirye harsuna. Wannan ne kawai hanyar tsarinsa, da oda ba ya bukatar tasirin bytes Bom ko da wani tsarinsa yake da'awarsu, a XML.

kai-aiki tare

A wani yanayi da cewa yana amfani da 8-bit alamu na aiki idan aka kwatanta da sauran Multi-byte hali sets, UTF-8 yana da wadannan abũbuwan amfãni:

  • A farko byte code jerin ƙunshi bayani game da tsawon. Wannan qara yadda ya dace da kai tsaye search.
  • Sauki gano farkon alama a matsayin farawa byte yana da iyaka zuwa ga wani ajali kewayon dabi'u.
  • Babu mahada byte dabi'u.

Kwatanta amfanin

UTF-8 tsarinsa ne m. Amma lokacin da amfani ga shigar da kasashen gabashin Asiya haruffa (Sin, Japan, Korea, kasar Sin rubutu ta amfani da ãyõyi) amfani 3-byte jerin. Har ila yau UTF-8 tsarinsa ne mafi ƙaranci daga gare sauran siffofin coding aiki gudun. A binary kasawa Lines samar da wannan sakamakon a matsayin binary kasawa Unicode.

A halin tsarinsa makirci

A halin tsarinsa makirci qunshi tsarinsa alamomin tsari da kuma hanya domin guda byte wuri code raka'a. Don sanin da tsarinsa makirci Unicode misali na samar da amfani da wani na farko byte domin lamba (Bom, byte domin mark).

Lokacin da Bom a UTF-8 fasalin tag aka iyakance kawai da tunani da amfani da siffofin coding. Matsaloli a kayyade endian UTF-8 da, kamar yadda ta tsarinsa naúrar size ne daya byte. Amfani da Bom ga wannan nau'i na coding da aka ba da ake bukata, kuma bã shawarar. Bom iya faruwa a cikin rubutu da za a tuba daga sauran codings amfani byte domin lamba ko sanya hannu ga UTF-8 tsarinsa. Shin jerin 3 bytes EF BB 16 16 BF 16.

Yadda za a kafa da UTF-8 tsarinsa

A HTML coding UTF-8 aka shigar da wadannan code:

shugaban

Meta http equiv-= "Content-Type" content = "text / html. Haruffa marasa = utf-8" ˃

A PHP UTF-8 tsarinsa da aka kafa ta amfani da BBC () aiki a farkon fayil bayan da kafa da fitarwa matakin darajar kuskure:

˂? Php

error_reporting (-1).

BBC ( "Content-Type: text / html. Haruffa marasa = utf-8 ').

Don haɗi zuwa wani MySQL database UTF-8 tsarinsa an saita:

˂? Php

mysql_set_charset ( 'utf8').

A CSS-fayil tsarinsa ne UTF-8 characters aka kayyade kamar haka:

@charset "utf-8".

Lokacin da ka ajiye fayiloli na iri daban-daban zabi UTF-8 tsarinsa ba tare da Bom, in ba haka ba da shafin za su yi aiki. Don yin wannan, a DreamWeave bukata don zaɓar menu abu "gyare-gyare - Page Properties - Title / Zabi lullube haruffa" to sauya rufewar zuwa UTF-8. Biye da reloading da page, cire rajistan alamar daga "Connect Unicode sa hannu (Bom)» da kuma amfani da canje-canje. Idan wani rubutu a kan wani page ko a cikin database da aka gabatar da wani nau'i na coding, shi wajibi ne don sake-shigar ko sake-encode. A lokacin da ka yi aiki tare da na yau da kullum maganganu, tabbata a yi amfani da Mai sauya u.

Zaka kuma iya ajiye fayil a UTF-8 tsarinsa a cikin "Notepad" na Windows. Bayan zabi menu abu "File - Ajiye As ..." ka shigar da zama dole nau'i na tsarinsa da kuma ajiye fayil a UTF-8.

A wani rubutu edita Notepad ++, idan kafa wasu fiye da UTF-8, via da menu abu "Juyawa zuwa UTF-8 ba tare da Bom» canza harafin da kuma ajiye a UTF-8.

babu wani madadin

A cikin mahallin na duniya, inda 'yan siyasa da ilimin harsuna suna iyaka ma share, da harafin sets da cewa suna gida halaye, suna da kadan amfani. Unicode ne guda haruffa cewa tana goyon bayan duk localizations. A UTF-8 - wani misali daga cikin dace a aiwatar Unicode, wanda shi ne:

  • Yana goyon bayan da fadi da kewayon kayan aikin, ciki har da karfinsu da ascii tsarinsa.
  • Shi ne resistant zuwa murdiya data;
  • sauki da kuma tasiri a cikin magani.
  • ne dandali m.

Da zuwan na UTF-8 muhawara game da abin da nau'i na tsarinsa ko haruffa ne mafi alhẽri, ya zama babu ma'ana.

Similar articles

 

 

 

 

Trending Now

 

 

 

 

Newest

Copyright © 2018 ha.birmiss.com. Theme powered by WordPress.